Post on 02-Feb-2023
101212
1
The HindiUrdu Treebank New Frontiers in Hindi and Urdu
Natural Language Processing
Dip)MisraSharmaLTRCIIITHyderabadIndiadip)iiitacin
OwenRambow
CCLSColumbiaNewYorkCityUSArambowcclscolumbiaedu
AshwiniVaidyaLinguis)csUniversityofColoradoBoulderUSAAshwiniVaidyacoloradoedu
Dec82012
COLING2012
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
2
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
TheHindiTreebank
bull 3Representa)onsndash DSDependencyStructurendash PBPropBank(lexicalpredicateXargumentstructure)ndash PSPhraseStructure
bull Whyhavethreelevelsofrepresenta)onWhatdoesldquolevelofrepresenta)onrdquomeaninfact
101212
3
WhatisaSyntac)cRepresenta)on
1 Syntac)cphenomena(ldquowhatrdquo)egndash Subjectofaverbndash Rela)veclausendash SmallclauseLinguiststendtoagreeonwhatphenomenaexist
2 Mathema)calrepresenta)ontype(ldquobasichowrdquo)egndash Phrasestructuretreendash Dependencytreendash OrsomethingmorecomplicatedgraphLFGTAGhellip
3 Formalsyntac)cdescrip)on(ldquodetailedhowrdquo)a Mappingfromphenomenatorepresenta)ons(inpar)culartype)b Chosenrepresenta)onforaspecificphenomenonalsocalledanalysisc Phenomenaextractedinrepresenta)onaretheinterpretaond Formaldescrip)onisasyntacctheoryifitmakespredic)ons
Representa)onTypesDependencyandPhraseStructure
bull DependencyTree(DS)ndash Onelabelalphabetwords(=wordsinasentence)ndash Allnodeslabeledwithwordsoremptystrings
bull PhraseStructureTree(PS)ndash Twodisjointlabelalphabetsterminals(=wordsinsentence)andnonterminals
ndash Allandonlyinteriornodesarelabeledwithnonterminals
ndash Leavesarelabeledwithterminalsoremptystringsbull Nothingelseispartofthedefini)on
101212
4
ExampleSmallClauses
bull Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash lsquoA)fconsideredSeemastupidrsquo
bull Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid
WhatisthePhenomenon
bull Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]
bull Buttwoproblemsndash Noverbndash herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-
bull Sondash A)fconsidered[smallclauseherstupid]
101212
5
WhatistheRepresenta)onType
bull Forthisexamplewewillshowdependencytreesandphrasestructuretrees
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
101212
6
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-
considers
A)f stupid
Subj ObjXECM
her
Subj
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
S
NP
A)f
VP
considers S
her VP
AdjPstupid
101212
7
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj ObjXECM
her
Subj
S
NP
A)f
VP
considers SC
her VP
AdjPstupidClosetoanalysisadoptedinChomsky(1981)
NoteonDSandPS
bull Theseanalysesareintui)velyverysimilarbull Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
2
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
TheHindiTreebank
bull 3Representa)onsndash DSDependencyStructurendash PBPropBank(lexicalpredicateXargumentstructure)ndash PSPhraseStructure
bull Whyhavethreelevelsofrepresenta)onWhatdoesldquolevelofrepresenta)onrdquomeaninfact
101212
3
WhatisaSyntac)cRepresenta)on
1 Syntac)cphenomena(ldquowhatrdquo)egndash Subjectofaverbndash Rela)veclausendash SmallclauseLinguiststendtoagreeonwhatphenomenaexist
2 Mathema)calrepresenta)ontype(ldquobasichowrdquo)egndash Phrasestructuretreendash Dependencytreendash OrsomethingmorecomplicatedgraphLFGTAGhellip
3 Formalsyntac)cdescrip)on(ldquodetailedhowrdquo)a Mappingfromphenomenatorepresenta)ons(inpar)culartype)b Chosenrepresenta)onforaspecificphenomenonalsocalledanalysisc Phenomenaextractedinrepresenta)onaretheinterpretaond Formaldescrip)onisasyntacctheoryifitmakespredic)ons
Representa)onTypesDependencyandPhraseStructure
bull DependencyTree(DS)ndash Onelabelalphabetwords(=wordsinasentence)ndash Allnodeslabeledwithwordsoremptystrings
bull PhraseStructureTree(PS)ndash Twodisjointlabelalphabetsterminals(=wordsinsentence)andnonterminals
ndash Allandonlyinteriornodesarelabeledwithnonterminals
ndash Leavesarelabeledwithterminalsoremptystringsbull Nothingelseispartofthedefini)on
101212
4
ExampleSmallClauses
bull Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash lsquoA)fconsideredSeemastupidrsquo
bull Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid
WhatisthePhenomenon
bull Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]
bull Buttwoproblemsndash Noverbndash herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-
bull Sondash A)fconsidered[smallclauseherstupid]
101212
5
WhatistheRepresenta)onType
bull Forthisexamplewewillshowdependencytreesandphrasestructuretrees
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
101212
6
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-
considers
A)f stupid
Subj ObjXECM
her
Subj
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
S
NP
A)f
VP
considers S
her VP
AdjPstupid
101212
7
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj ObjXECM
her
Subj
S
NP
A)f
VP
considers SC
her VP
AdjPstupidClosetoanalysisadoptedinChomsky(1981)
NoteonDSandPS
bull Theseanalysesareintui)velyverysimilarbull Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
3
WhatisaSyntac)cRepresenta)on
1 Syntac)cphenomena(ldquowhatrdquo)egndash Subjectofaverbndash Rela)veclausendash SmallclauseLinguiststendtoagreeonwhatphenomenaexist
2 Mathema)calrepresenta)ontype(ldquobasichowrdquo)egndash Phrasestructuretreendash Dependencytreendash OrsomethingmorecomplicatedgraphLFGTAGhellip
3 Formalsyntac)cdescrip)on(ldquodetailedhowrdquo)a Mappingfromphenomenatorepresenta)ons(inpar)culartype)b Chosenrepresenta)onforaspecificphenomenonalsocalledanalysisc Phenomenaextractedinrepresenta)onaretheinterpretaond Formaldescrip)onisasyntacctheoryifitmakespredic)ons
Representa)onTypesDependencyandPhraseStructure
bull DependencyTree(DS)ndash Onelabelalphabetwords(=wordsinasentence)ndash Allnodeslabeledwithwordsoremptystrings
bull PhraseStructureTree(PS)ndash Twodisjointlabelalphabetsterminals(=wordsinsentence)andnonterminals
ndash Allandonlyinteriornodesarelabeledwithnonterminals
ndash Leavesarelabeledwithterminalsoremptystringsbull Nothingelseispartofthedefini)on
101212
4
ExampleSmallClauses
bull Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash lsquoA)fconsideredSeemastupidrsquo
bull Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid
WhatisthePhenomenon
bull Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]
bull Buttwoproblemsndash Noverbndash herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-
bull Sondash A)fconsidered[smallclauseherstupid]
101212
5
WhatistheRepresenta)onType
bull Forthisexamplewewillshowdependencytreesandphrasestructuretrees
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
101212
6
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-
considers
A)f stupid
Subj ObjXECM
her
Subj
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
S
NP
A)f
VP
considers S
her VP
AdjPstupid
101212
7
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj ObjXECM
her
Subj
S
NP
A)f
VP
considers SC
her VP
AdjPstupidClosetoanalysisadoptedinChomsky(1981)
NoteonDSandPS
bull Theseanalysesareintui)velyverysimilarbull Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
4
ExampleSmallClauses
bull Hindindash अातफampसीमाकोवकफसमझाndash A)fneSeemakobewakuufsamjhaandash A)fErgSeemaAccstupidconsiderPfvndash lsquoA)fconsideredSeemastupidrsquo
bull Englishndash A)fconsideredSeemastupidndash A)fconsideredherstupid
WhatisthePhenomenon
bull Syntac)callyandseman)callyconsidertakesaclausalcomplementndash A)fconsidered[clausethatsheisstupid]ndash A)fconsidered[clauseherstupid]
bull Buttwoproblemsndash Noverbndash herisseman)callysubjectofstupidbuthasaccusa)vecasewhichisunusual(subjectsareusuallynomina)ve)-
bull Sondash A)fconsidered[smallclauseherstupid]
101212
5
WhatistheRepresenta)onType
bull Forthisexamplewewillshowdependencytreesandphrasestructuretrees
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
101212
6
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-
considers
A)f stupid
Subj ObjXECM
her
Subj
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
S
NP
A)f
VP
considers S
her VP
AdjPstupid
101212
7
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj ObjXECM
her
Subj
S
NP
A)f
VP
considers SC
her VP
AdjPstupidClosetoanalysisadoptedinChomsky(1981)
NoteonDSandPS
bull Theseanalysesareintui)velyverysimilarbull Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
5
WhatistheRepresenta)onType
bull Forthisexamplewewillshowdependencytreesandphrasestructuretrees
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
101212
6
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-
considers
A)f stupid
Subj ObjXECM
her
Subj
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
S
NP
A)f
VP
considers S
her VP
AdjPstupid
101212
7
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj ObjXECM
her
Subj
S
NP
A)f
VP
considers SC
her VP
AdjPstupidClosetoanalysisadoptedinChomsky(1981)
NoteonDSandPS
bull Theseanalysesareintui)velyverysimilarbull Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
6
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectandaccusa)vecasemarkingthroughnodelabel-
considers
A)f stupid
Subj ObjXECM
her
Subj
Analysis1aforSmallClausesNoAccusa)veCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj Obj
her
Subj
S
NP
A)f
VP
considers S
her VP
AdjPstupid
101212
7
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj ObjXECM
her
Subj
S
NP
A)f
VP
considers SC
her VP
AdjPstupidClosetoanalysisadoptedinChomsky(1981)
NoteonDSandPS
bull Theseanalysesareintui)velyverysimilarbull Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
7
Analysis1bforSmallClausesExcep)onalCaseMarking
bull Structurerepresentsherassubjectbutnotaccusa)vecasemarkingofher-
considers
A)f stupid
Subj ObjXECM
her
Subj
S
NP
A)f
VP
considers SC
her VP
AdjPstupidClosetoanalysisadoptedinChomsky(1981)
NoteonDSandPS
bull Theseanalysesareintui)velyverysimilarbull Formalno)onldquoconsistencyrdquo(FeiXiaseeBhaoRambowampFei2011)ndash Intu)onverysimpleandgeneralalgorithmcantransformconsistentDStoPSandvice-versaI
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
8
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
Obj
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
Subj ObjPred
her
Obj
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
9
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2
NeoXPaniniananalysis
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
समझा
अातफamp वकफ
k1 k2s
सीमाको
k2
NeoXPaniniananalysisfromIIITHyderabadUsedforDSinHindiXUrduTreebank
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
10
Analysis2aforSmallClausesGeneralMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)butnotherasseman)csubject
considers
A)f stupid
Subj Obj2
her
ObjNP
A)f
VP
considers her AdjP
stupid
S
Analysis2bforSmallClausesSyntac)cMonoclausalAnalysis
bull Structurerepresentsaccusa)vecasemarkingofher-(asobjectofmatrixverb)andherasseman)csubjectusingnodelabel
considers
A)f stupid
k1 k2s
her
k2NP
A)f
VP
considers her
AdjP
stupid
S
SC
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
11
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategory
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
Analysis3forSmallClausesRaisingtoObject
bull Structurerepresentsaccusa)vecasemarkingofher-andherasseman)csubjectbutrequiresemptycategoryconsiders
A)f stupid
Subj ObjXPred
her1
Obj
e1
Subj
S
NP
A)f
VP
considers S
VP
AdjPstupid
her1
e1
AnalysisusedforPSinHindiXUrduTreebank
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
12
ComparisonofRepresenta)ons
bull LessInforma)on bull Sameinforma)on
considers
A)f stupid
Subj Obj
her
Subj
considers
A)f stupid
Subj Obj2
her
Obj
considers
A)f stupid
Subj ObjXPred
her1
Obj
e1Subj
considers
A)f stupid
Subj ObjPred
her
Obj
considers
A)f stupid
Subj ObjXECM
her
Subj
Tree1a
Tree2a
Tree1b
Tree2b
Tree3
SummarySyntac)cPhenomenaRepresenta)onTypesAnalyses
bull Syntac)cphenomenaaretheempiricaldataofsyntaxaspartofthescienceoflanguagendash Canbeverysimilaracrosslanguages
bull Therecanbeseveralpossibleanalysesndash Somehavelessinforma)onndash Buttherecanbedifferentanalysesthatrepresentthesameinforma)ondifferently
bull TheanalysescanbesimilarinDSandPSbull Lotsofchoicesintreebankdesign
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
13
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)cdependencycanbeencodedinPSandtypicallyis
bull Usualconven)onaoachmentinprojec)onshowstypeofdependency
ArenrsquotDSandPSRepresenta)onsComplementaryNO
bull Syntac)ccons)tuencyisrepresentedinDSbull Usualconven)oneachnodeisthewordandtheheadofthephrasecontainingitandalldescendents
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
14
WhatDoesThisMeanforNLP
bull Treebanksarenotnaturallyoccurringdatabull Theguidelinesarepainstakinglyproducedbylinguistsandrepresentaformaldescrip)onofthelanguage
bull Annotatorsunderstandasentencedeterminewhatsyntac)cphenomenaexistandusetheguidelinestochooseananalysisforthesentence(astructure)
bull Usersofthetreebankcanusetheguidelinestointerpretthestructuresandgetbackthesyntac)cphenomenapresent
bull Thesephenomenaandnottheirrepresenta)oninthetreebankcanbeusedforNLPinwhatever-representa2on-chosen-by-the-researcher
bull Thereisalreadylotsoflinguis)csinourresourceswejustneedtomakeuseofthatlinguis)cinforma)on
TheHindiTreebank
bull DSdependencyannotatedbyhandbull PBannotatedbyhandontopofDSaddsinforma)onaboutlexicalseman)csndash Doesnotchangetreesndash Addslabelstoarcsandfeaturestonodes
bull PSphrasestructurederivedautoma)callyfromDS+PBndash Containslessinforma)onthanDS+PBndash DSandPScontaindifferentinforma)on
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
15
ComparisonofDSPBPS(Sample)
DS PB PS
How Dependency
PhraseStructure
What Dis)nguishunerga)veunaccusa)ve
Dis)nguishtemporalloca)veadjuncts
Dis)nguishunaccusa)vetransi)vewithemptyagent
Overview
bull Introduc)ontothenatureofsyntac)crepresenta)ons(Rambow15minutes)bull Introduc)ontothemorphologysyntaxandlexicalseman)csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta)onforHindiandUrduincludingencodingissues
tokeniza)onpartXofXspeechtagsandmorphologicalrepresenta)on(SharmaandRambow20minutes)
bull Thedependencyrepresenta)on(DS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Sharma25minutes)
bull Thelexicalseman)crepresenta)on(PB)forHindiandUrduprinciplesrepresenta)onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta)on(PS)forHindiandUrdusyntaxprinciplesrepresenta)onandexamples(Rambow25minutes)
bull Sampleini)alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
1
Introduction to Morphology Syntax and
Lexical Semantics of Hindi and Urdu
Dipti Misra Sharma ltdiptiiiitacingt
LTRC IIIT Hyderabad India
Dec 8 2012
COLING 2012
Outline
Introduction Some facts about Hindi and Urdu
Linguistic properties Morphology Some basic Syntax Lexical semantics
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
2
HindiSome facts
A major language of Indo-Aryan family Official language of 11 Indian states Uttar Pradesh Uttarakhand Bihar Delhi Jharkhand Chhattisgarh Himachal Pradesh Haryana Rajasthan and Madhya Pradesh Also spoken outside India in Fiji Mauritius Guyana etc Number of speakers who returned Hindi as their mother tongue (in India) 422048642 (4103) (2001 Census of India report) A large population in India who speak Hindi as their second language Script Devanagari ndash a syllabic script
Urdu Some facts
An Indo-Aryan language Evolved in India around eight-tenght centuries from khariboli a dialect spoken in and around Delhi Significant borrowings from Arabic and Persian It was also known as rekhta (mixed language) Official language of Pakistan Official language of states of India Also spoken in Fiji Bangladesh etc Number of speakers in India 51536111 (501) (2001 Census of India report) Script Perso-Arabic
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
3
Hindi-Urdu (Hindustani)
Hindi and Urdu are mutually intellible Linguists consider them as two registers of the
same language Similar in grammatical structures Differ in vocabulary particularly in the formal
written varieties A mixed variety of the two is used as a lingua
franca in India and is also known as Hindustani
Some Basic characteristics of HindiUrdu
HindiUrdu have relatively free word order
The unmarked word order in both the languages is subject-object-verb
(SOV)
Auxiliary verbs follow the main verb
Nouns are followed by postpositions
Adjectives precede the nouns they modify
In Urdu sometimes adjectives follow the noun (ezafe constructions)
Large use of participles complex predicates and causatives
Reduplication and echo-compounding are productively used in Hindi
Urdu (in fact almost all the Indian languages)
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
4
Morphology
Hindi and Urdu have following morpholgical properties Grammatical gender masculine and feminine Number singular and plural Person first second and third Case direct oblique and vocative Adjectives inflect for number gender and case
ndash Some adjectives do not decline
Nouns Nouns in HindiUrdu are inflected for number and case
Gender All nouns have inherent gender pankhaa (fanmasc) lataa
(creeperfem) ghar (housemasc)
Number Singular pankhaa (fan) lataa (creeper) ghar (house) Plural pankhe (fans) lataeM (creepers) ghar (houses) Case
The case roles in Hindi are normally marked by postpositions However Hindi nouns reflect two cases morphologically Direct and Oblique
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
5
Case Direct nouns are in nominative and are not followed by a postposition
Occur denoting subject andor object
LaRkaa aayaa lsquothe boy camersquo ltladkaanmsg3dgt ltaavmsg3yaagt
laRkii aayii the girl came ltladkiinfsg3dgt ltaavfsg3yaagt
Oblique nouns are objects of a postposition such as ne (erg) ko
(accdative) se (instr) meM (loc) par (loc) and kaa (gen)
laRke ne roTii khaayii the boy ate bread ltlaRkaanmsg3oblgt ltroTinfsg3dgt ltkhaavfsg3yaagt
laRke ne roTii ko zamiin se uThaayaa the boy picked the bread from the floor ltlaRkaanmsg3oblgt ltroTinfsg3oblgt ltzamiinnfsg3oblgt ltuThaavmsg3yaagt
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
6
Pronouns
Morphologically like nouns the pronouns also inflect for number and case Sg - dir yaha (this) vaha (that) jo (whowhich) kaun (whointerro) kyaa (what) and
kuch (some)
Sg ndash Obl isa (this) usa (that) jisa (which) kisa (whichinterro) koii (someone) and kisii (someone)
Pl - dir ye (these) ve (those) jo (whopl) Pl ndash Obl a except before ne (erg) ina (these) una (those) jina (whoever) kina (whointerro)
kinhiiM (whoindef) b inhoM (these) unhoM (those) jinhoM (who) kinhoM (who) and kinhiiM (some
peopleindef)
Pronouns (Contdhellip)
Before lsquonersquo maiM (I) and tuu (yousg) donrsquot change For eaxmple
maiMne (Ierg) and tuune (youerg)
Before other postpositions maiM (I) and tuu (yousg) change to the oblique forms mujh (me) and tujh (yousg) Thus mujhko (to me) and tujhko (to yousg)
Before all postpositions ham (we) tum (yousgpl) and aap (you-hon) donrsquot change form
hamne (weerg) tumne (youerg) aapne (you-honerg) humko (to us) tumko (to yousgpl) aapko (to you-hon)
maiM (I) tuu (you) ham (we) and tum (you) donrsquot attach to kaa (gen) postposition Instead they have irregular forms meraa (my) teraa (your) hamaaraa (our) and tumhaaraa (your)
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
7
Adjectives Morphologically an adjective is inflected for gender number and case as it agrees with the following noun
Postpositions are attached only to the nouns Adjectives preceding these nouns also have oblique form acche laRke ne lsquogood boys ergrsquo
The transformations that an adjective (eg acchaa good) undergoes with regard to number gender and case are given below
Case rarr Ditect Oblique
NumberrarrGender darr
Sg Pl Sg Pl
Masc acchaa acche acche acche
Fem acchii acchii acchii acchii
Verbs Verbs in HindiUrdu are inflected for tense aspect mood (TAM) and the
agreement features of gender number and person Tense aspect and mood are mostly expressed by auxiliaries in HindiUrdu
Thus only certain moods aspects and tense are marked in the verb forms Given below are some examples of various verb forms from HindiUrdu
Root ro cry
Infinitive ronaa to cry
Habitual rotaa cryhab
Perfective royaa cried
Causative rulaa cause someone to cry
rulvaa make someone to cause someone to cry
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
8
Auxiliaries Auxiliaries mark Tense Aspect and Modality information on
verbs (a) bacce khaanaa khaa rahe haiM Children_d meal eat prog beplpres
The children are having a meal (b) bacce khaanaa khaate rah sakte haiM Children_d meal eat_nf prog ablit beplpres
The children can continue to have their meal
Auxiliaries also carry the gender number and person information
Postpositions
Postpositions largely mark the case relations baccoM ne raat meM mez se khaanaa le liyaa children_obl erg night in table ablat food take reflpst Hindi also has compound postpositions
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
9
Compound Post-positions
Compound postpositions are formed by connecting the postpositions ke kii and se with other words as follows
ke anusaar lsquoaccording torsquo ke alaavaa lsquoin addition torsquo ke kaaran lsquobecause ofrsquo ke dvaaraa lsquothroughrsquo ke saamne lsquoin front ofrsquo ke liye lsquoforrsquo kii orataraf lsquotowardsrsquo kii tarah lsquolikersquo kii jagah lsquoin place ofrsquo se baahar lsquoout ofrsquo se pahle lsquobeforersquo
Urdu Specific Features
Prepositions in Urdu ezafe in Urdu
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
10
Urdu has Prepositions Unlike Hindi Urdu has prepositions as well Some Urdu examples with prepositions are (a) qabl before qabl az_ayn (qabl azeen) qabl az_waqt before from_this before from_time before this before time (b) dar ininside amidst dar_ayn asnah (dariin asnah) in_this moment (c) az fromsincefor az raahe hamdardi az sare nau khaarij az bahes from way empathy from beginning new beyond from discussion for the sake of courtesy (d) ta tountiltill ta waqt until time
ezafe in Urdu
Urdu has what is referred to as ezafe Normally marks a genitive but is not restricted to genitive alone EZ N+N daur-e-hukumat period-of-rule hukumat-e-hind government-of-India
EZ N+Adj nasl-e-insani race-of-humanity lamha-e-aakhar The last moment EZ Adj+N qabil-e-rahem quailified for sympathy EZ Adj+Adj qabil-e-qubul qualified-for-acceptance
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
11
Reduplication A morphological processes
Words belonging to various categories can be reduplicated These expressions are often hyphenated Reduplication has various morphological functions depending on the
lexical category which is reduplicated For example
Nouns it adds the sense of every Verbs it brings the sense of adverbial participle Adjectives and adverbs it adds intensity
Hindi has three types of reduplication full partial and redundant Reduplication is highly productive in these languages
Full Reduplication
If the word is X then its reduplicated form is X-X
raam-raam lsquoRam-Ramrsquo (proper noun) baccaa-baccaa child-child (common noun) garam-garam lsquohot-hotrsquo (adjective) dhiire-dhiire lsquoslowly-slowlyrsquo (adverb) jaa-jaa lsquogo-gorsquo (verb) naa-naa lsquonot-notrsquo (negative particle) kyaa-kyaa lsquowhat-whatrsquo (question word) jaate-jaate lsquogoing-goingrsquo (participle)
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
12
Partial Reduplication (Echo words) In partial reduplication an expression X is repeated partially Only a part of a given word is repeated which gives the meaning of lsquoX etcrsquo In HindiUrdu The first consonant of X is replaced by v-
For example khaanaa-vaanaa lsquofood-etcrsquo
vaanaa is not a valid word in Hindi Some more examples of partial reduplication in Hindi are jaanaa lsquogoinglsquo $ jaanaa-vaanaa lsquogoing etcrsquo aaloo lsquopotatorsquo $ aaloo-vaaloo lsquopotato etcrsquo aisaa lsquolike thisrsquo $ aisaa-vaisaa lsquolike this etcrsquo There are also examples which do not fall in this pattern The meaning of such words changes substantially and does not have the sense of etc bhaag lsquorunrsquo $ bhaagambhaag lsquorushrsquo jhuuTh lsquoliersquo $ jhuuth-muuTh lsquojust like that (without meaning it)rsquo dekh lsquoseersquo $ dekhaa-dekhii lsquoin imitationrsquo
Some Basic Syntax
Hindi and Urdu are both relatively free word order SOV languages
For case marking Hindi primarily uses postpositions The verb agrees either with subject or with object Adjective agrees with the noun it modifies
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
13
Simple Transitive
trans-1 आतफ़ कताब पढ़गा Atif kitab paRhegaa Atif bookf readmsgfut Atif will read the book
trans-2 आतफ़ कताब पढ़ी
Atif ne kitaab paRhii Atif erg bookf readfsgpst Atif read the book
trans-3 आतफ़ को कताब पढ़नी पड़ी
Atif ko kitaab paRhnii paRii Atif dat bookf readfinf compelfpst Atif had to read the book
Intransitive Unergative
Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
unerg-2 आतफ़ सोया
Atif ne soyaa Atif erg sleepmsgpst Atif slept
unerg-3 आतफ़ को सोना पड़गा
Atif ko sonaa paRegaa Atif dat sleepinf compelfut Atif will have to sleep
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
14
Intransitive Unaccusative
unacc-1 दरवाज़ा ख9गा darvaazaa khulegaa doormsgd openmsgfut The door will open
unacc-2 दरवाज़ ख9ला darvaaze ne khulaa doormsgobl erg openpst The door opened
unacc-3 दरवाज़ को ख9लना पड़गा
darvaaze ko khulnaa paRegaa doormsgobl dat openinf compelfut The door will have to open
Existential
exist-1 उस कमgt चAB C us kamre meM cuuhe haiM that room in rats beprespl There are rats in that room
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
15
Dative Subject
unacc-4 कल रात बादलD चाEद दखा kal raat baadaloM meM caaMd dikhaa yesterday night clouds in moon see(unacc)pst Yesterday night the moon was seen behind the clouds
dat-subj-1 कल रात बादलD म9झको चाEद दखा
kal raat baadaloM meM mujhko caaMd dikhaa yesterday night clouds in medat moon see(unacc)pst Yesterday night I saw the moon behind the clouds
Ditransitive
ditrans-1 राम मोहन को कताब Hगा raam mohan ko kitaab degaa Ram Mohan dat bookf givemsgfut Ram gave a book to Mohan
ditrans-2 राम मोहन को कताब दी
raam ne mohan ko kitaab dii Ram erg Mohan dat bookf givefsgpst Ram gave a book to Mohan
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
16
Complement Clause
compl-cl-1 राम जानता J क सीता Hर K आएगी raam jaantaa hai ki siita der se aayegii Ram knowhabmsg besgpres that Sita late part comefsgfut
Ram knows that Sita will arrive late
Relative Clause
rel-cl-1 Lरी बहन जो दMली रहती J कल आ रही J merii bahan jo dillii meM rahtii hai kal aa My sister who Delhi in stayhabfsg besgpres tomorrow come
rahii hai progfsg besgpres
My sister who stays in Delhi is coming tomorrow
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
17
Relative Clause
rel-cl-2 N वह कताब जो त9म दी थी पढ़ ली maiMne vah kitaab jo tumne dii thii paRh lii Ierg that bookf which youerg givefsgpst befsgpst read reflfsgpst I have read the book which you gave me
rel-cl-3 N वह कताब पढ़ ली जो त9म दी थी
maiMne vah kitaab paRh lii jo tumne dii thii Ierg that bookf read reflfsgpst which youerg givefsgpst befsgpst
I have read the book which you gave me rel-cl-4 जो कताब त9म दी थी वह N पढ़ ली
jo kitaab tumne dii thii vah maiMne paRh lii which bookf youerg givefsgpst befsgpst that Ierg read reflfsgpst I have read the book which you gave me
Complex Predicate
compl-pred-1 राम रव की PतीQा कर रहा था raam ravi kii pratikshaa kar rahaa thaa Ram Ravi gen wait do progmsg bemsgpst Ram was waiting for Ravi
compl-pred-2 राम रव को याद कर रहा था
raam ravi ko yaad kar rahaa thaa Ram Ravi acc remember do progmsg bemsgpst Ram was remembering Ravi
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
18
Causatives Unerg-1 आतफ़ सोएगा
Atif soyegaa Atif sleepmsgfut Atif will sleep
causative-1 आया आतफ़ को स9लाया
aayaa ne Atif ko sulaayaa maid erg Atif acc sleepcauspst lsquoThe maid caused the child to sleeprsquo
causative-2 माE आया K आतफ़ को स9लवाया
maaN ne aayaa se Atif ko sulvaayaa mother erg maid by Atif acc sleepcauspst lsquoThe mother made the maid to cause the child to sleeprsquo
Lexical Semantics Semantic properties of certain verb types seem to affect the case selection for certain arguments For example Experiencer verbs
The experiencer argument takes dative case raam ko bukhaar hai raam ko caand dikhaa raam ko dukh hai Ram dat fever bepres Ram dat moon see-unaccpst Ram dat sorrow bepres Participatory verbs
The second argument of the participatory verbs takes se postposition
raam ravi se carcaa karega siitaa raam se shaadi karegii Ram Ravi to discussion domsgfut Sita Ram with marriage dofsgfut ravi mohan se milegaa Ravi Mohan with meetmsgfut
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
19
References amp Agnihotri Rama K 2007 Hindi An Essential Grammar Routledge
London and New York amp Kachru Yamuna 2006 Hindi London Oriental and African Language
Libtary amp McGregor R S 1995 Outline of Hindi Grammar Oxford University
Press amp Dr Sharma A 1975 A Basic Grammar of Modern Hindi New Delhi
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Representing Tokens Morph Analysis
POS and Chunks in The HindiUrdu Treebanks
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
2
Outline
Tokeniza3on MorphologicalRepresenta3on POStagging Chunking InterLchunkdependencyannota3on IntraLchunkdependencies
Tokeniza3on Automa3c Issues
Compounds Punctua3onsForexample
usa$$ladake$$ne$$$$$kelaa$$$$$$khaayaa$$$$thaa$thatboyergbananaeatLperfpast
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
3
Tokenization
Represented in SSF ADDR TOKEN 1 usa 2 laDake 3 ne 4 kelA 5 khAyA 6 thA 7
Tokeniza3onIssues
Punctua3onsAllpunctua3onstobetokenized CompoundsBAILbahana(brotherLsister)bAlikALvixyAlaya(girlLschool)
Compoundsinternallycontainapunctua3on Areproduc3ve Morphologicalanalysisofthemembersofthecompounds Theissuewhethertocreateasingletoken Decision Createthreetokens MarkthehyphenasJOIN
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
4
MorphAnalysisanditsRepresenta3on
af defines the composite attribute consisting of root category gender number person case tam (tenseaspectmodality)vibhakti(case marker) suffix
ADDR_ TKN_ OTHR 1 usa ltfs af=rsquovahaprrsquogt 2 laDake ltfs af=rsquolaDakaanmsg3orsquogt 3 ne ltfs af=rsquonepsprsquogt 4 kelaa ltfs af=rsquokelaanmsg3orsquogt 5 khaayaa ltfs af=rsquokhaavmsganyyaarsquogt 6 thaa ltfs af=rsquokelaaversquogt 7 ltfs as=ampSTOPpuncgt
POS$Tagging$$
ILMTPOSTagsetsadopted Total26tags ADDR TKN CAT OTHR 1 usa PRP ltfs af=rsquovahapronhelliprsquogt 2 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 3 ne PSP ltfs af=rsquonepsphelliprsquogt 4 kelA NN ltfs af=rsquokelAnounhelliprsquogt 5 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 6 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 7 SYM ltfs as=ampSTOPpuncgt
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
5
Chunking$$
Chunking is introduced to save the effort in manual tagging Dependency relations are marked between the chunk heads Chunking restructures the tree ie the value of ADDR_ will change) ADDR_ TKN_ CAT_ OTHR 1 (( NP 11 usa PRP ltfs af=rsquovahapronhelliprsquogt 12 laDake NN ltfs af=rsquolaDakAnounhelliprsquogt 13 ne PSP ltfs af=rsquonepsphelliprsquogt )) 2 (( NP 21 kelA NN ltfs af=rsquokelAnounhelliprsquogt )) 3 (( VG 31 khAyA VM ltfs af=rsquoKAverbhelliprsquogt 32 thA VAUX ltfs af=rsquokelAverbhelliprsquogt 4 (( BLK 41 SYM ltfs as=ampSTOPpuncgt ))
Rambow
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
Paninian Grammatical Model and
HindiUrdu Treebanks
Dipti Misra Sharma IIIT Hyderabad
ltdiptiiiitacingt
COLING-2012
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
2
Outline(
PaninianGramma3calframeworkTheGramma3calModelusedintheHindiUrdutreebanks
Somebasicconcepts SomeHindiconstruc3ons
Causa3ves CoLordina3on Unaccusa3ves Rela3veclauses
Conclusions
Introducon(
TreebankLOneofthemostimportantlinguis3cresources U3lityinvariousNLPtaskssuchasparsingnaturallanguageunderstandingetc Linguis3cinforma3onencodedatdifferentlevelssuchasmorphologicalsyntac3csyntac3coLseman3c(dependency)
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
3
HindiDependencyTreebank
TheCorpus
Newsar3cles350k Tourismar3cles25L30k Conversta3onaldata25L20k
DependencygrammarframeworkPaninianGramma3calmodel
WhyPaninianGrammarIndianlanguages Richmorphology Rela3velyflexiblewordorderForexamplea)baccaaphalakhaataahailsquochildrsquolsquofruitrsquolsquoeat_habrsquolsquopresrsquob)phalabaccaakhaataahaic)phalakhaataahaibaccaad)baccaakhaataahaiphala
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
4
Paninis(Grammar((
Datedaround500BC Seekstoprovideacompletemaximallyconciseandtheore3callyconsistentanalysisofSanskritgramma3calstructure BasedonspokenformltKiparsky1993gt Focusesonlanguageasameansofcommunica3on
PaninisGrammarcontd
TreatsasentenceasaseriesofmodifierLmodifiedrela3ons Everysentencehasaprimarymodified(generallyaverb) Rela3onsbetweenverbsandtheirpar3ciapantscalledlsquokarakarsquo Otherrela3onsndashsuchasreasonpruposegeni3veetc Therela3onsareexpressedthroughexplicitmarkerscalledvibhak3
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
5
Sabinaopenedthelock
opened
k1 k2
Sabina lock
the
K1 (Karta) the doer of the action (the locus of activity) K2 (Karma) locus of result
Sabina opened the lock with this key
opened
sabina lock key
the this
k1 k2
k3
K3 (karaNa) instrument
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
6
Yesterday Sabina opened the lock with this key at my home
opened
Yesterday Sabina lock key home
the this my
k7t k1 k2 k3
k7p
K7t (deshadhikaraNa) time K7p (kaladhikaraNa) place
Yesterday the lock opened with this key
opened
yesterday lock key
the this
k7t k1
k3
lock becomes the karta
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
7
LevelsofAnalysis
L1 ndash Semantic relations karakas eg raama karta L2 ndash Morphosyntactic vibhakti eg raama prathamaa L3 ndash Morphological representation (abstract) vibhakti markers eg raama + su (Sanskrit)rlm raama + 0 (Hindi)rlm raama + du (Telugu)rlm L4 ndash Phonological form raamaH (Sans)rlm raama (Hindi) raamudu (Telugu)rlm
OurModel
Morphanalysis POStagging Iden3fyminimalcons3tuents(chunksbags)andtheirheads
Marktherela3onsacrosschunks(headtoheadrela3on)rlm
ChunkLinternaldependenciesarelelunspecified
Thetreesarefullyexpandedautoma3cally
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
8
ForExample
meraabaDzaabhaaiibahutaphalakhaataahai=gtmeraa_PRPbaDzaa_JJbhaaii_NN
bahuta_QFphala_NNkhaataa_VMhai_VAUX=gt((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG
ExampleContd
((meraa_PRPbaDzaa_JJbhaaii_NN))_NP
((bahuta_QFphala_NN))_NP
((khaataa_VMhai_VAUX))_VG(t1)khaa (t2)khaabhaaiiphalabhaaiiphalameraabaDZaabahuta
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
9
KarakaRela3ons Directpar3cipantsinanac3onevent Syntac3coLseman3c Thekartaandkarmaofaverbaredeterminedbytheverbsseman3cs Verbdenotesanac3onevent Anyac3onisabundleofsubLac3onsSabinaopenedthelockwiththekeyThekeyopenedthelockThelockopened
Seman3csoftheverb
Averbalrootdenotes$ Theac3vity$ Theresult
Locusofac3vitykarta Locusofresultkarma
Verbal(Root(
acvity( result(
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
10
kartaLkarma
Theboyopenedthelock$ k1ndashkarta$ k2ndashkarma
kartakarmasome3mescorrespondtoagenttheme$ NotalwaysThedooropened$ Thedooriskarta$ Thesentencehasnoexplicitkarma
(open(
boy( lock(
k1 k2
SubLac3onsLOpeningoflock
Openingoflock
Inser3ngandkeypressing latchmovingturningakeyandturningandlockopening(ac3on1) thelever(ac3on3)
(ac3on2)
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
11
SubLac3onsLOpeningoflock
open(
boy( lock( key(
k1k2 k3
open(
open(
lock(
lock(key(
k1
k1 k2
k1ndashkarta(doer)rlmk2ndashkarma(affected)rlmk3ndashkarana(instrument)rlm
Thus Theac3onofopeningnormallyrequiresanagen3vepar3cipantSoSabinaopenedthelockHoweverThespeakermaydecidenottoexpresstheroleoftheagentHenceThekeyopenedthelock ThekaraNa(instrument)israisedtotheroleofkarta(doerLkaraNaLkartri)ThelockopenedThekarmaisraisedtotheroleofkarta(doerLkarmaLkartri)Thuskartaortheotherkarakarolescanshildependingonwhatthespeakerwantstoexpress(vivaksha) WhichsubLac3onthespeakerwantstofocuson
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
12
SpeakerrsquosInten3on(vivakshaa)rlm
Everysentencereflectsspeakerrsquosinten3on$ Par3cipantsareassignedvariousrela3onsaccordingly(a)Iopenedthelockwiththis+key(b)Iamsurethis+keywillopenthelock$ lsquokeyrsquogetsassignedkarta(inb)karana(ina)basedonwhatthespeakerwantstoexpress
Syntaxreflectsvivaksha
The Scheme
Morph analysis POS tagging Chunking Mark the syntactic relations (dependency relations) across
chunks (head to head relation) rlm
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
13
Overview
Objective The Scheme
$ Morph Analysis $ POS Tagging $ Chunking $ Dependency Relations
Dependency Scheme Relations in Dependency Scheme Some Hindi Constructions
Objective
To evolve an adequately comprehensive tagging scheme for the purpose of annotating corpora for dependency relations within a sentence
We are developing treebanks for HindiUrdu
Following Paninian framework as the annotation scheme
We show how the scheme handles some phenomena such as complex verbs causatives relative clauses conjunctions etc in Hindi
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
14
An Example Example
$ meraa badZaa bhaaii bahuta phala khaataa hai
my lsquoelder lsquobrother lsquolots fruits lsquoeat+HABlsquo PRES
lsquoMY elder brother eats lots of fruitsrsquo
An Example (Contd)
Morph Analysis
$ meraa ltfs af= root=meraa cat=pron gend=any num=sg pers=1 case=ogt $ badZaa ltfs af= root=badZaa cat=adj gend=m gt $ bhaaii ltfs af= root=bhaaii cat=n gend=m num=sg pers=3 case=dgt $ bahuta ltfs af= root=bahuta cat=adj gend=any gt $ phala ltfs af= root=phala cat=n gend=m num=any pers=3 case=dgt $ khaataa ltfs af= root=khaa cat=v gend=m num=sg pers=3 TAM=taagt $ hai ltfs af= root=hai cat=v gend=any num=any pers=3 gt
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
15
An Example (Contd )
POS Tagging
$ meraa_PRP baDzaa_JJ bhaaii_NN bahuta_QF
phala_NN khaataa_VM hai_VAUX Chunking
$ ((meraa_PRP))_NP
((baDzaa_JJ bhaaii_NN))_NP
((bahuta_QF phala_NN))_NP
((khaataa_VM hai_VAUX))_VG
An Example (Contd)
Dependency Relation
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
16
Dependency Scheme
The Paninian approach treats a sentence as a series of modifier-modified relations Hence it provides framework for dependency analysis In our dependency tree
$ each node is a chunk and $ the edge represents the relations between the connected nodes labeled with the karaka or
other relations Chunk represents a set of adjacent words which are in dependency relations with each
other All the modifier-modified relations between the heads of the chunks (inter-chunk
relations) are marked in this manner
Dependency Scheme (Contd)
Here modifier-modified relations are marked between the heads of the chunks
$ meraa lsquomyrsquo $ bhaaii lsquobrotherrsquo $ phala lsquofruitrsquo and $ khaataa lsquoeatsrsquo
badZaa lsquobigrsquo and bahut lsquomuchrsquo are part of the chunks
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
17
Dependency Scheme (Contd)
khaataa k1 k2
bhaii phala r6
meraa
Relations in Dependency Scheme
There are 3 types of relations in Dependency Scheme
amp Karaka relations amp Relations other than karakas and
amp Relations which do not fall under dependency relation directly but are required for
showing the dependencies indirectly
Karaka relations are participants directly involved in the action denoted by the verb
Relations other than karakas denote purpose reason Relations which do not fall under dependency relation directly are used for
representing co-ordination and complex predicates
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
18
Basic karaka relations
Only six
$ karta ndash subjectagentdoer
$ karma ndash objectpatient
$ karana ndash instrument
$ sampradaan ndash beneficiary
$ apaadaan ndash source
$ adhikarana ndash location in placetimeother
Relations other than karakas
r6 ndash Genitive rt ndash Purpose rh ndash Reason nmod_relc ndash Relative clause rad ndash Address
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
19
Relations which do not fall under dependency relation
ccof ndash Conjunction pof ndash Complex Predicates fragof ndash Fragment of
Dependency Relation Types
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
20
Some Hindi Constructions
(1) Causative Constructions maaz ne aayaa se bacce ko khaaanaa khilvaayaa lsquomotherrsquo lsquoErgrsquo lsquomaidrsquo lsquobyrsquo lsquochildrsquo lsquoAccrsquo lsquofoodrsquo lsquoeat-Causrsquo lsquoMother caused the maid to feed the childrsquo
Issue
$ Possibility-I Go by syntactic analysis
amp khilvaa lsquocause to eatrsquo is the verb root amp maaz ne has karta vibhakti so mark as k1 amp aayaa se has karana vibhakti so mark as k3 amp bacce ko has sampradan vibhakti so mark as k4
Causative Constructions (Contd hellip)
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
21
Causative Constructions (Contd hellip)
Possibility-II
$ The verb khilvaa lsquocause to eatrsquo is a causative verb and it is morphologically related to the base verb khaa lsquoeatrsquo
$ Paninian framework provides the relations
amp prayojaka karta causerlsquo (pk1) The causer in a causative construction amp prayojya karta causeelsquo (jk1) The causee in a causative construction amp madhyastha karta mediator causerlsquo (mk1) The mediator-causer in the causative
construction
Causative Constructions (Contd hellip)
Possibility-II
$ Do we mark the above dependency roles $ If we mark these relations then root will be khaa lsquoeatrsquo
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
22
Ex maaz ne (k1) cammaca se (k3) bacce ko khaanaa (k2) khilavaayaa lsquoMother fed the child with the spoon Ex maaz ne (pk1) aayaa se (mk1) bacce ko (jk1) khaanaa (k2) khilavaayaa Mother made the maid to feed the childlsquo As there is morphological relatedness between the base verb khaa lsquoeatrsquo and
causative verb khilvaa lsquocause to eatrsquo we mark pk1 mk1 jk1 instead of k1 k3 k4 respectively
For causatives our current decision Follow Possibility-II
Causative Constructions (Contd hellip)
(2) Relative Clauses (nmod__relc)
Ex jo ladZakaa vahaaz khadZaa hai vaha meraa bhaaii hai
rsquowhorsquo rsquoboyrsquo rsquotherersquo rsquostandrsquo rsquoisrsquo rsquohersquo rsquomyrsquo rsquobrotherrsquo rsquoisrsquo lsquoThe boy who is standing there is my brotherrsquo
Issue
$ Possibility-I
amp Provides relation between vaha lsquohersquo in main clause and jo ladZakaa lsquothe boyrsquo in rel clause
amp The dependency of jo ladZakaa lsquothe boyrsquo is on vaha lsquohersquo amp jo ladZakaa lsquothe boyrsquo is the root of the relative clause lsquojo ladZakaa vahaaz
khadZaa hairsquo
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
23
Relative Clause Possibility-I
Relative Clauses (nmod__relc)
Possibility-II
$ The verb khadZaa hai rsquois standingrsquo is the root of the relative clause $ The modifier of vaha rsquohersquo in main clause is the entire relative clause $ Here the relation between jo ladZakaa lsquothe boyrsquo in the relative clause
and vaha lsquohersquo in the main clause is captured by the feature coref
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
24
Relative Clause Alternative-II
Relative Clauses (Contdhellip)
For relative clauses our current decision Follow Possibility-II In Possibility-II jo ladZakaa lsquothe boyrsquo in the rel clause attaches with the
verb khadzaa hai lsquois standingrsquo of the relclause The relclause attaches with vaha lsquohersquo of main clause by lsquonmod__relcrsquo
relation The relation between jo ladZakaa lsquothe boyrsquo and vaha lsquohersquo is captured by
the feature coref
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
25
(3) anubhava karta ndash k4a
Ex-1 mujhko dukh hai lsquoIDatrsquo lsquounhappy lsquoisrsquo lsquoI am unhappyrsquo Here ko vibhakti in mujhko lsquoto mersquo tells that it is not a karta Here dukh lsquounhappyrsquo is the karta Here mujhko lsquoto mersquo is a subtype of sampradan This sampradan is different from the sampradan (k4mdashbeneficiary) We call it as anubhava karta represented by k4a
anubhava karta ndash k4a (Contd )
Ex-2 raam ne (agent) caaMd dekhaa Base verb lsquoramrsquo lsquoErgrsquo lsquomoonrsquo lsquosawrsquo lsquoRam saw the moonrsquo Ex-3 raam ko (experiencer) caaMd dikhaa Derived lsquoramDatrsquo lsquomoonrsquo lsquoappearedrsquo Intransitive lsquoMoon
was visible to mersquo Verb
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
26
anubhava karta ndash k4a (Contdhellip)
Ex-2
Ex-3
anubhava karta ndash k4a (Contdhellip)
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
27
(4) Relation samanadhikaran- rs
Ex-1 raam ne kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo $ Ex-2 raam ne yaha kahaa ki vo kal aayegaa lsquoRam said that he will come tomorrowrsquo In Ex-1 the clause lsquoki vo kal aayegaarsquo is the object ie karma In Ex-2 the clause lsquoki vo kal aayegaarsquo is the complement of the object
yaha lsquothisrsquo so it attahes to yaha as rs
Relation samanadhikaran- rs (Contdhellip)
Ex-1
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
28
Relation samanadhikaran- rs (Contd) ndash Ex-2
(5) Conditionals Ex agara vaha biimaara na hotii to paartii me jZarUra aatii lsquoifrsquo lsquoshersquo lsquosickrsquo lsquonotrsquo lsquohappenedrsquo lsquothenrsquo lsquopartyrsquo lsquoinrsquo definitelyrsquo lsquocomersquo
lsquoHad she been not sick she would have definitely come to the partyrsquo
Issue
$ Possibility-I Abstract node $ Possibility-II One clause depends on the other clause
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
29
Possibility - I
agar-to paired-ccof paired-ccof
agar to ccof ccof
naa hotii aatii
Possibility - II
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
30
Conditionals (Contd) Possibility-I is not possible because agar-to is the head of the tree
which is an abstract node ie it is not a lexical node For conditionals our current decision Follow Possibility-II In Possibility-II the agar lsquoifrsquo clause is dependent on the to lsquothenrsquo
clause Here the agar lsquoifrsquo clause is the subordinate clause and to lsquothenrsquo
clause is the main clause
(6) Participles (vmod)
In non-adjectival partiples an argument of a verb (main) is shared with another verb(participle)
The arguments occurs only once in the sentence but is semantically related to both the verbs The shared argument syntactically always attaches with the main verb For the other verb this argument is semantically realized but not syntactically
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
31
Participles (vmod) (Contd )
Ex vaha rojZa patra likhakara PaadZataa hai
rsquohersquo rsquodailyrsquo rsquoletterrsquo rsquohaving writtenrsquo rsquotearrsquo rsquoisrsquo
lsquoHaving letters written everyday he tearsrsquo
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
32
Participles (vmod) (Contd )
The arguments vaha lsquohersquo and pawra lsquoletterrsquo of the verb PaadZataa
lsquotearsrsquo is shared with another participle verb likhakar lsquohaving
writtenrsquo
Participles (vmod) (contd)
Paadzataa hai k1 k7t k2 vmod
vaha rojZa pawra likhakar k1 k2
vaha pawra
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
33
(7)Ellipsis
How to show dependencies when the head is missing Ex tum jo bhi kahoge (vo) mai maan luungii lsquoyou lsquowhateverrsquo lsquowill sayrsquo lsquothatrsquo lsquoIrsquo lsquowill believersquo lsquoI will believe whatever you sayrsquo In the above example vo lsquothatrsquo is missing which becomes the parent node
for relative clause lsquotum jo bhi kahogersquo We insert a null element ie NULL_NP for vo lsquothatrsquo to show the dependency
Ellipsis (Contd)
maan luungii k1 k2
mai NULL__NP (vo) nmod__relc
kahoge k1 k2
tum jo bhi
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
34
Ellipsis (Contd)
Ex bacce badZe ho gaye hai (aur) kisii kii baat nahii sunate
lsquochildrenrsquo lsquobigrsquo lsquohappenrsquo lsquoisrsquo lsquono onersquo lsquoGenrsquo lsquomatterrsquo lsquonotrsquo lsquolistenrsquo ldquoThe children have grown up they dont listen to anyonerdquo No explicit conjunct Insert a NULL element to show the dependencies (if it is essential) NULL_CCP (aur) ccof ccof badZe_ho_gaye nahii_sunate
Non-dependency Relations
ccof ndash Conjunction pof ndash Complex Predicates fragof -- Fragment of
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
35
(1) Conjunction (ccof)
ccof relation doesnrsquot reflects a dependency relation It is used for coordinating as well as subordinating conjunctions The dependency trees will show the conjuncts as heads In coordinating conjuncts the conjunct is the head and takes the coordinating
elements as its children In subordinating conjunct it would take the clause to which it is syntactically
attached (the subordinate clause) as its child
Conjunction (ccof) (Contdhellip)
Coordinate Conjunction
$ Ex raam ne khaanaa khaayaa aur siitaa ne seb khaayaa lsquoramrsquo lsquoErgrsquo lsquofoodrsquo lsquoatersquo lsquoandrsquo lsquositarsquo Ergrsquo lsquoapplersquo lsquoatersquo
lsquoRam ate food and Sita ate an applersquo
Subordinate Conjunction
$ Ex raam ne kahaa ki vo kal aayegaa lsquoramrsquo lsquoErgrsquo lsquosaidrsquo thatrsquo lsquohersquo lsquotomorrowrsquo lsquocome-Futrsquo lsquoRam said that he will come tomorrowrsquo
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
36
Coordinate Conjunction (ccof)
Subordinate Conjunction
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
37
(2) Conjunct Verbs
Ex maine usase ek prashna kiyaa lsquoI-ergrsquo lsquohim-instrsquo lsquoonersquo lsquoquestionrsquo lsquodidrsquo lsquoI asked him a questionrsquo The noun prashna lsquoquestionrsquo within the conjunct verb sequence prashna kiyaa
lsquoquestionedrsquo is being modified by the adjective ek lsquoonersquo and not the entire noun-verb sequence
The annotation scheme should be able to account for this relation in the
dependency tree If prashna kiyaa is grouped as a single verb chunk it will not be possible to
mark the appropriate relation between ek and prashna
Conjunct Verbs (Contd)
To overcome this problem we break ek prashna kiyaa into two separate chunks [ek prashna]NP [kiyaa]VG
The dependency relation of prashna with kiyaa will be POF (lsquoPart OFrsquo relation) It means noun or an adjective in the conjunct verb sequence will have a POF relation with
the verb This way the relation between ek and prashna becomes an intra-chunk relation as they will
now become part of a single NP chunk Conjunct verbs are chunked separately but semantically they constitute a single unit It captures the fact that the noun-verb sequence is a conjunct verb by linking them with
POF relation
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
38
Conjunct Verbs (Contd)
kiyaa k1 k2 pof
maine usase prashna
nmod
ek
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
1
Overviewbull Introduc3ontothenatureofsyntac3crepresenta3ons(Rambow15minutes)bull Introduc3ontothemorphologysyntaxandlexicalseman3csofHindiandUrdu
(Sharma40minutes)bull Themorphologicalrepresenta3onforHindiandUrduincludingencodingissues
tokeniza3onpartLofLspeechtagsandmorphologicalrepresenta3on(SharmaandRambow20minutes)
bull Thedependencyrepresenta3on(DS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Sharma25minutes)
bull Thelexicalseman3crepresenta3on(PB)forHindiandUrduprinciplesrepresenta3onandexamples(Vaidya25minutes)
bull Thephrasestructurerepresenta3on(PS)forHindiandUrdusyntaxprinciplesrepresenta3onandexamples(Rambow25minutes)
bull Sampleini3alexperimentsinHindiandUrduNLPusingtheHUTB(SharmaandRambow15minutes)
LexicalSeman3cRepresenta3onforHindiampUrduprinciplesrepresenta3onandanalysis
AshwiniVaidyaUniversityofColoradoBoulder
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
2
Contents1 Mo3va3on
2 IntroducingPropBank3 Framefiledefini3on4 HindiPropBank5 Linguis3cPhenomena
Whyisseman3cinforma3onimportant
bull Imagineanautoma3cques3onansweringsystembull Whocreatedthefirsteffec3vepoliovaccinebull Twopossiblechoicesndash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
3
WordMatches
bull Whocreatedthefirsteffec3vepoliovaccinendash BectonDickinsoncreatedthefirstdisposablesyringeforusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash Thefirsteffec3vepoliovaccinewascreatedin1952byJonasSalkattheUniversityofPiasburgh
Parsing
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinson]createdthe[firstdisposablesyringe]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccine]wascreatedin1952by[JonasSalk]attheUniversityofPiasburgh
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
4
Seman3cRolelabelling
bull Whocreatedthefirsteffec3vepoliovaccinendash [BectonDickinsonagent]createdthe[firstdisposablesyringetheme]forusewiththemassadministra3onofthefirsteffec3vepoliovaccine
ndash [Thefirsteffec3vepoliovaccinetheme]wascreatedin1952by[JonasSalkagent]attheUniversityofPiasburgh
SRLgivesustherightanswer
bull Weneedseman3cinforma3ontoprefertherightanswer
bull Thethemeofcreateshouldbelsquothefirsteffec3vepoliovaccinersquo
bull Thethemeinthefirstsentencewaslsquothefirstdisposablesyringersquo
bull Wecanfilteroutthewronganswer
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
5
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
Weneedseman3cinforma3on
bull Tofindoutabouteventsandtheirpar3cipantsbull Tocaptureseman3cinforma3onacrosssyntac3cvaria3on
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
6
Seman3cinforma3on
bull Seman3cinforma3onaboutverbsandpar3cipantsexpressedthroughseman3croles
bull AgentExperiencerThemeResultetcbull Howeverdifficulttohaveastandardsetofthema3croles
Proposi3onBank
bull Proposi3onBank(PropBank)providesawaytocarryoutgeneralpurposeSeman3crolelabelling
bull APropBankisalargeannotatedcorpusofpredicateLargumentinforma3on
bull Asetofseman3crolesisdefinedforeachverbbull Asyntac3callyparsedcorpusisthentaggedwithverbLspecificseman3croleinforma3on
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
7
PropBankFramefiles
bull PropBankdefinesseman3crolesonaverbLbyLverbbasis
bull Thisisdefinedinaverblexiconconsis3ngofframefiles
bull Eachpredicatewillhaveasetofrolesassociatedwithadis3nctusage
bull Apolysemouspredicatecanhaveseveralrolesetswithinitsframefile
Anexample
bull Johnringsthebellring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
8
Anexample
bull Johnringsthebellbull Tallaspentreesringthelakering01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringfor
ring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Anexample
bull [John]rings[thebell]bull [Tallaspentrees]ring[thelake]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
9
Anexample
bull [JohnARG0]rings[thebellARG1]bull [TallaspentreesARG1]ring[thelakeARG2]ring01 Makesoundofbell
Arg0 Causerofringing
Arg1 Thingrung
Arg2 Ringforring02 Tosurround
Arg1 Surroundingen3ty
Arg2 Surroundeden3ty
Ring01
Ring02
HindiPropBank
bull Annota3ngHindiPropBankinvolvesthreestepsndash Crea3onofframefilesndash Emptyargumentinser3onndash Seman3crolelabelling
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
10
FramefilesforHindi
bull Twotypesofframefilesndash Framesforsimpleverbs[385frames703predicates]ndash Framesfornominalsincomplexpredicates[18751902predicates]
EmptyArguments
bull PropBankinserts4typesofemptyargumentsndash prodroppednullargumentsrecoverablefromdiscoursecontext
ndash PROemptysubjectsofnonLfinitecomplementandadjunctclausesndash RELPROgapsinpar3cipialrela3veclausesndash GAPelidedargumentsincoLordinatedclauses
bull PROandRELPROareinsertedautoma3callybull GAPandproareinsertedmanually
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
11
PropBankTagsetNumberedArguments NumberedArgumentswithfunc=ontags
ARGACauser ARGALMNSIndirectcauser
ARG0Agentexperiencer ARG0LMNSInducedcauser
ARG1Themepa3ent ARG0LGOLCauseewithalsquorecipientrsquorole
ARG2Recipient ARG2LATRAaribute
ARG3Instrument ARG2LGOLGoal
ARG2LSOUSource
ARG2LLOCLoca3on
ARG2LDIRDirec3on
PropBankTagsetModifierArguments
ARGMLTMPTemporalARGMLMNRManner
ARGMLLOCLoca3on
ARGMLPRPPurpose
ARGMLCAUCause
ARGMLDISDiscourse
ARGMLADVAdverb
ARGMLMNSMeans
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
12
Linguis3cphenomena
bull Simpletransi3vebull Unaccusa3veandUnerga3vebull Existen3albull Da3vesubjectbull Ditransi3vebull Causa3vesbull ComplexPredicates
SimpleTransi3vetransL1 आतफ़ कताब पढ़गा A3fkitabpaRhegaa A3fbookfreadmsgfut A3fwillreadthebook
transL2 आतफ़ कताब पढ़ी A3fnekitaabpaRhii A3fergbookfreadfsgpst A3freadthebook
transL3 आतफ़ को कताब पढ़नी पड़ी A3fkokitaabpaRhniipaRii A3fdatbookfreadfinfcompelfpst A3fhadtoreadthebook
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
13
Unaccusa3veampUnerga3ve
bull Dis3nc3onbetweenintransi3veverbsndash unaccusa3vesegKula(open)Puta(explode)ndash Unerga3veseghaMsa(laugh)
bull Singleargumentofunaccusa3vestakesArg1unerga3vestakeArg0
bull Diagnos3ctestsareusedtodis3nguishunaccusa3vesfromunerga3vesndash Eganimacytestcognateobjecttestamongothers
Intransi3veUnaccusa3veunaccL1 दरवाज़ा ख67गा darvaazaakhulegaa doormsgdopenmsgfut
Thedoorwillopen
unaccL2 दरवाज़ ख6ला darvaaze nekhulaa doormsgoblergopenpst Thedooropened
unaccL3 दरवाज़ को ख6लना पड़गा darvaazekokhulnaapaRegaa doormsgobldatopeninfcompelfut Thedoorwillhavetoopen
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
14
Intransi3veUnerga3veUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
unergL2 आतफ़ सोया A3fnesoyaa A3fergsleepmsgpst A3fslept
unergL3 आतफ़ को सोना पड़गा A3fkosonaapaRegaa A3fdat sleepinfcompelfut A3fwillhavetosleep
Existen3alexistL1 उस कमgt चAB C uskamremeMcuuhehaiM thatroominratsbeprespl Thereareratsinthatroomrsquo
bull Wedis3nguishbetweenexisten3alandcopulasentencetypesbymeansofdifferentrolesetIDs
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
15
Da3veSubject
unaccL4 कल रात बादलD चाEद दखा kalraatbaadaloMmeiMcaaMddikhaa yesterdaynightcloudsinmoonsee(unacc)pst
YesterdaynightthemoonwasseenbehindthecloudsdatLsubjL1 कल रात बादलD म6झको चाEद दखा kalraatbaadaloMmeiMmujhkocaaMddikhaa yesterdaynightcloudsinmedatmoonsee(unacc)pst
YesterdaynightIsawthemoonbehindtheclouds
unacc4
datsubj1
TheARG0analysisofda3vesubjectsmaychangeinfuturePBannota3on
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
16
Ditransi3veditransL1 राम मोहन को कताब Hगा raammohankokitaabdegaa RamMohandatbookfgivemsgfut RamgaveabooktoMohan
ditransL2 राम मोहन को कताब दी raamnemohankokitaabdii RamergMohandatbookfgivefsgpst RamgaveabooktoMohan
Causa3ves
bull Hindihastwowaysofformingthecausa3vebull Addndashaa
ndash (sosulaa)sleepmakesomeonesleepbull Addndashvaa
ndash (sulaasulvaa)makesomeonesleepcausesomeonetofallasleep
bull WeintroducethelabelARGAtoanalyzecausersbull SubtypesofARG0(ARG0LGOLARG0LMNS)forcausees
bull ARGALMNSforintermediatecausers
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
17
Causa3vesUnergL1 आतफ़ सोएगा A3fsoyegaa A3fsleepmsgfut A3fwillsleep
causa3veL1 आया आतफ़ को स6लाया
aayaaneA3fkosulaayaa maidergA3faccsleepcauspst lsquoThemaidcausedthechildtosleeprsquo
causa3veL2 माE आया I आतफ़ को स6लवाया maaNneaayaaseA3fkosulvaayaa
motherergmaidbyA3faccsleepcauspst lsquoThemothermadethemaidtocausethechildtosleeprsquo
Causa3ves
Causa3veL1
Causa3veL2
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
18
Causa3vesclasses
Complexpredicates
bull Thesearecasessuchasbharosaakarnaa`trust(n)do(v)rsquotrust
bull Suchcasesarehandledusinganounframeforbharosaa[abhayneARG0][sitaaparARG1]bharosaakiyaa
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
19
ComplexPredicate
complLpredL1 राम रव की JतीKा कर रहा था raamravikiipra9kshaakarrahaathaa RamRavigenwaitdoprogmsgbemsgpst Ramwaswai3ngforRavirsquo
Complexpredicate
complLpredL2 रामरवीIलड़बMठा raamraviseladZabaithaa RamRaviinstfightsitperf RamregreaablyfoughtwithRavirsquo
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
20
ComplementClause
complLclL1राम जानता P क सीता Hर I आएगी raamjaantaahaikisiitaderseaayegii RamknowhabmsgbesgpresthatSitalatepartcomefsgfut
RamknowsthatSitawillarrivelate
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
1
PhraseStructureRepresenta3on
OwenRambowCCLSColumbiaUniversityrambowcclscolumbiaedu
PhraseStructure(PS)Representa3onintheHindiandUrduTreebanks
bull DevisedbyRajeshBhaLUniversityofMassachuseLsAmherstndash AssistedbyAnnahitaFarudiandOwenRambow
bull Developedinconjunc3onwithDSandPBbull InspiredbyChomskyantradi3on
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
2
BackgroundforPS
bull Chomskyanprogramndash Mo3vatedbyclaimsaboutlanguageacquisi3oninchildren
ndash Developatheoryofsyntaxsuchthatsyntaxofalanguagecanbeexplainedbybull LanguageVuniversalprinciplesbull LanguageVspecificparameters
bull PSforHindiinspiredbyChomskyanprogrambutnotfollowinganyspecificChomskyanapproach
BasicPrinciplesofPS
bull PSrepresentsrela3onbetweenlexicalpredicateargumentstructure(interfacetolexicon)andsurfacewordorder(interfacetophonologyandseman3csroughlyspeaking)
bull Thesetwolevelsarerelatedbyderiva3onsndash Wordsandcons3tuentsmoveandleavetraces
bull Transforma3onalgrammar
bull Monostratalrepresenta3onbull NotunlikeEnglishPennTreebank
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
3
SpecificAssump3onsaboutRepresenta3onMadebyPS
bull Phrasestructurebull No3onoflexicalheadswithprojec3ons(XVbartheorysortof)andassociatedfunc3onalprojec3onsndash Nounswithpostposi3onsndash Verbswithauxiliariesandcomplemen3zers(ki)
bull Binarybranchingndash Theore3calreasonsndash TobedifferentfromDS
BasicTransi3veClause(1)
bull Therearetwoprivilegedposi3onsintheverbalprojec3oncorrespondingusuallytoDSrsquosk1andk2
VPVPred
VP
NP
NPA3f
kitab
V
paRhegaaआततफिकताबपढ़गा
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
4
BasicTransi3veClause(2)
bull Therepresenta3onismaintainedwhenwehaveanerga3veconstruc3on
VPVPred
VP
NPVP
NPA3fVne
kitab
V
paRhii
आततफिकताबपढ़ी
Intrasi3veClauseUnerga3ve
bull PSmakesadis3nc3onbetweenunerga3veandunaccusa3ve
bull Inunerga3vetheresimplyisnoobject
VPVPred
VP
NP
A3fV
soyegaa
आततफसोएगा
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
5
Intrasi3veClauseUnaccusa3ve
bull Argumentstartsinlowerposi3on(becauseoflexicalseman3cs)andmovestohigherposi3on(becausehigherposi3onhasnooccupant)
VPVPred
VP
NP1
NPdarvaazaa
CASE1
V
khulegaa
दरवाज़ाख89गा
Existen3als
bull Existen3alho`bersquoisunaccusa3ve(becauseagentVfree)andloca3onisanadjunct
उस कमlt=चA
VPVPred
VP
NP1
NPcuuhe
CASE1
V
hain
VP
NPVP
uskamremein
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
6
Ditransi3ve
bull TherecipientisintroducedasadjoinedtotheVPVPredafixedbutnotstructuralposi3on
VPVPred
NPVP
NP
RamVne
kitaab
V
dii
VPVPred
NPVP
MohanVko
VP
राममोहन कोिकताबदी
PugngitAllTogetherDa3veSubjects
कलरा बादलE=म8झकोचाGददखा
VPVPred
NP1
NP
caaMd
CASE1
V
dikhaa
VP
VP
NPVP
baadaloMmein
VP
NP
kalraat
VPVPred
NP
SCR2
VPNP2
mujhko
bull Dikhaaisinterpretedseman3callyasaditrasi3vesomeonemakessomethingappeartosomeone
bull Sincetheagentisabsentthelowerargumentraisestothehigherposi3on(likeunaccusa3ve)
bull Theda3vebeneficiaryisbasegeneratedinthefixedda3veposi3on(adjoinedtoVPVPred)andthenscrambleselsewhere
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve
101212
7
ComplementClauseswithki
रामजानताAि सीताIरJआतएगी
VPVPred
VP
NP
NPRam
EXTR1
V VPVPred
NP
VP
NP1
Sita
CASE1
V
aayegi
VPVPred
NPVP
CP1
C
ki
dersejaantaa
VVAux
VP
VP
hai
Rela3veClause
VPVPredNPVP
NP
tumne
SCR1
V
dii
VPVPred
NP
PRO
VP
VP
NP1
jokitaab
CP
C
COMPVVAux
thii
VP VPVPred
VP
NPVP
maineV
paRhii
NPSCR3
VP
NP
NP
vah
जो कताब त8म दी थी वह L पढ़ ली jokitaabtumnediithiivahmainepaRhliiwhichbookfyouerggivefsgpstbefsgpstthatIergreadreflfsgpstIhavereadthebookwhichyougavemersquo
101212
8
ComplexPredicate
VPVPred
VP
NP
NPVP1Raam
RaviVko V
kar
VVAux
VP
VP
rahaa
VVAux
thaa
Vrsquo
NP
NP
CASE1
N
yaad
राम रव को याद कर रहा था raamravikoyaadkarrahaathaaRamRaviaccrememberdoprogmsgbemsgpstRamwasrememberingRavi
Causa3ve