МЕЖДУНАРОДНЫЕ СТАНДАРТЫ В ОБЛАСТИ КОРПУСНОЙ...
Transcript of МЕЖДУНАРОДНЫЕ СТАНДАРТЫ В ОБЛАСТИ КОРПУСНОЙ...
CTpyKTypHaJI l'I n p l'I Kil 3A H aJI 11 "1 H re "1 CT"1 Ka
9
ISSN 0202-2400
• • • • • •• •• •• •••• • • • • • • • • • • • • •••••• • ••••• •••••••••••
LA FILOLÓGICA POR LA CAUSA
CAHKT-TIETEPBYPfCKJ1J1 fOCY}lAPCTBEHHhIJ1 YHJ1BEPCJ1TET
CTPYKTYPHAH 11 I1Pl1KJIA,[J;HAH Jil1HfBl1CTl1KA
Me:HCBY308CKUU c6opHuK
BhrrrycK 9
LA FILOLÓGICA POR LA CAUSA
Y,UK 80+618.31 BBK 81.1
C83
Pe A a K l..l w o H Ha H Ko JI JI er w H: npoc}>. n. H. EellJleea, npocp. A. C. fepo (oTB. pe
AaKTop ), npocp. 0. H. fpuH6ayM, npocp. M.A. MapyceHKO
C e K p e T a p b peAaKQHOHHot1: KOJIJiernn B. J!f. Py6uHep
p e l..l e H 3 e H T KaHA. cpHJIOJI. Hayi< AOQ. J!f. II. llaHKOO
lle11amaemcH no nocmaHoeneHu10 PeoaK11uoHHo-u3oamenbcKOzo coeema
<fiunonozuttecKozo <fiaKynbmema C.-llemep6ypzcKozo zocyoapcmeeHHozo yHueepcumema
CTp'fKTYPHaH H npHKJiap;HaH JIHHrBHCTHKa. Bbm. 9: Me)l(
C83 By3. c6. I no.a; pe,a;. A. C. fep,a;a. - CI16.: l13A-BO C.-I1eTep6. ytt-Ta, 2012. - 356 c.
C6opttHK (Bbm. 8 BbIUieJI B 2010 r.) coAep)f(HT cTaTbH no UIHpOKOMY
Kpyry npo6JieM TeopenrtteCKOH 11 npHKJiaAHOH JIHHfBHCTHKH, no npHMe
HeHHIO MaTeManttteCKHX MeTOAOB B Jl3bIK03HaHHH.
,D;JIR cneQHaJIHCTOB no TeOpJrn: Jl3bIKa, npHKJiaAHOH H TeopenPieCKOH
JIHHfBHCTHKe.
66K81.l
© C.-IleTep6yprCKJ1H
rocyAapcTBeHHbIH,
yttHBepcHTeT,2012
LA FILOLÓGICA POR LA CAUSA
B. fl. 3axapoa
ME)l{,!J;YHAPO,!J;HbIE CTAH,!J;APTbl B OBJIACTJ1 KOPIIYCHOM JIJ1HfBJ1CTJ1KJ1
AHHOmatjUJl. B CTaTbe o6cylKJtalOTCll BOnpOCbl pa3pa60TKl1 CTaHttapTOB B o6naCTl1 Kopnycttoi1 mrnrs11cT11K11. ,!l.aeTc11 nottpo6HbIH attan113 peKOMCHJtal.(111'.i npoeKTa TEI (Text Encoding Initiative). PaccMaTpusa10TC11cpettCTBa11 cnoco6b1 3attaHJ.111 MeTattaHHhIX 11 nHHrBHCTH'!eCKOH pa3MeTKl1.
K111011ea1>1e CllOBa: Kopnyctta11 n11ttrn11crnKa, 113b!KOBOH Kopnyc, CTaHttapTb1, pa3MeTKa, Text Encoding Initiative.
V. P. Zakharov
INTERNATIONAL STANDARDS IN CORPUS LINGUISTICS
Summary. The paper deals with issues of corpus linguistics standards. The Text Encoding Initiative (TEI) project recommendations are carefully analysed. Tools and methods of adding contextual information (metadata) to a corpus and linguistic annotation are discussed.
Keywords: corpus linguistics, language corpora, standards, tagging, annotation, Text Encoding Initiative.
l. BcTynrrenHe
KopnycmUJ mrnrsHCTHKa - CJIO)f(Haf! JIHHrBHCTH'leCKaR AHCJ..\HITllHHa,
KOTOpaF! ccpopMHpoBaJiaCb B IJOCJieAHHe AeCRTHJieTJrn Ha 6a3e 3lleKTpOH
HOH Bhl'rncnHTeJihHOH TeXHHKH. 0Ha H3y'laeT nocTpoeHHe JIHHrBHCTH
qecKHX KopnycoB, cnoco6hI o6pa60TKH AaHHbIX B HHX H co6cTBeHHO
MeTOAOllOfHIO HX C03AaHHR H HCITOJib30BaHHR. MO)f(HO CKa3aTb, 'ITO BCe
COBpeMeHHhie mrnrBHCTH'leCKHe HCClleAOBaHHR H pa60Tbl no COCTaBJie
HHIO CllOBapeM: H rpaMMaTHK TaK HllH HHa'!e opHeHTHpOBaHhI Ha HCilOJib-
30BaHHe npeAcTaBHTellbHhIX KopnycoB TeKCTOB. Pa3BHTHe coBpeMeH
HhIX HHTeJineKTyaJibHblX nporpaMMHbIX CHCTeM, npeAHa3Ha'leHHbIX AJIR
© B. Il.3axapon,2012
201
LA FILOLÓGICA POR LA CAUSA
o6pa6oTKl1 TeKCTOB Ha ecTecTBeHHOM JJ3bIKe, TaIOKe Tpe6yeT 60JibllIOH
3Kcnepl1MeHTaJihHOH JIHHrBl1CTl1'l.eCKOH 6a3bI. Cnpoc Ha KopnycHbie ,n;aH
Hhre COBnaJI c nOJJBJieHl1eM COOTBeTCTBYIOI.I.\HX TeXHH'l.eCKHX B03MO)f(H0-
CTeH.
Kopnychr, KaK npaBl11IO, npe,n;Ha3Ha'IeHbI ,n;AA Heo,n;HoKpaTHoro np11-
MeHeHJ1J1 MHOfl1Ml1 n01Ib30BaTelIJJMJ1, n03TOMY J1X pa3MeTKa J1 J1X mrnr
BJ1CTl1'IeCKOe 06ecne'l.eH11e .n;orr)f(HbI 6hITb onpe,n;eJieHHhIM o6pa30M YHl1-
cpl1Ql1pOBaHbI. CrnH,n;apTbI B OTHOIIIemm KopnycoB o6bI'l.HO 3aTparnBaIOT
COBMeCTl1MOCTb Tl1nOB pa3MeTKH. J1x Ha3bIBaIOT HHOr,n;a «CmaHoapmaMU KooupoBaHUH». TaK)f(e Ba)f(eH Bonpoc, CBJJ3aHHhIH co cpaBHHMOCThIO
pa3HbIX KopnycoB, B TOM 'l.l1C1Ie c OQeHKaMH no noBo,n;y J1X np11ro,n;HOCTl1
K pa3JIJ1'IHbIM 3a,n;aHJ1J1M. J1x Ha3bIBaIOT «cmaHoapmaMu Ol.{eHKU». Ha116011hIIIy10 ClIO)f(HOCTb npe,n;CTaB1IJ1eT CTaH,n;apTH3aQl1R TpaHcKp11-
611poBaHJ1J1 ycTHOH peq11 11 11crnp11qecKJ1X KopnycoB. Ecnl1 B o6rracTl1
rpacpH'IeCKOH cp11KcaQ1111 ycTHOH pe'l.11 ,n;a)f(e np11 OTCYTCTBHl1 e,n;11Horo
J1 o6JJ3aTe1IbHOfO AlIR BCex CTaH,n;apTa ,n;OCTHrHYT HeKOTOpbIH nporpecc
(CBJJ3aHHbIH npe)f(.n;e Bcero c HalIH'l.HeM npeQe,n;eHTOB), TO B on11caHHl1
HeBep6aJibHOH COCTaBlIRIOI.I.\eH ecTeCTBeHHOR3bIKOBOH KOMMYHHKaQHH
CTaH,n;apTbI .n;o CHX nop He Bbipa6oTaHbI, 'ITO 3aTpy,n;HReT ,n;allhHeHIIIee
npo,n;BH)f(eHHe B 3T_OH 0611acrn (EapaHOB 2007). CTaH,n;apTH3aQJ1JJ B OTHOIIIeHHl1 KopnycoB, COBMeCTHMOCTb THnOB
,n;aHHbIX Ba)f(Hbl H c TO'l.KH 3peHHJJ cpaBHHMOCTH pa3HbIX KopnycoB.
Ilpw!eM KOpnyCbI MOryT no,n;BepraTbCJJ KaK KOlIH'l.eCTBeHHOH, TaK l1 Ka
'IeCTBeHHOH OQeHKe. KoJIH'IecTBeHHhie ,n;aHHhie o Kopnycax no3B01IJJIOT
cy,n;HTb 06 l1X o6'beMe, o HanorrHeHl1H Kopnyca no pa31Il1'l.HbIM Kp11Tep11-
JJM, o Jil1HrBOCTaTl1CTH'IeCKHX napaMeTpax Kopnyca l11IH no,n;KopnycoB.
flo,n; Ka'l.eCTBeHHOH OQeHKOH nOHHMaeTCJJ OQeHKa l1 cpaBHeHHe KOpny
COB Ha OCHOBC aHalIH3a BbI,n;aBaeMbIX pe3yJibTaTOB.
Bonpocbr npl1ro,n;HOCTH KopnycoB K pa3Jil1tJHbIM 1111HrBl1CTH'l.eCKHM
3a,n;aHJ1JJM TaIOKe Tpe6y10T CBOHX «CTaH,n;apTOB OQeHKH».
2. 063op Me)l(/:~yttapo,n;HblX CTan.n;apTOB 1<0pnycu01'i 11HHrBHCTHKH
B HaCTOJJI.I.\ee BpeMR Ha OCHOBe Me)f(,n;yHapo,n;Horo OilbITa Bbipa6oTa
lIJ1Cb .n;e-cpaKTO CTaH,n;apTbI npe,n;CTaBJieHJ1JJ MeTa,n;aHHbIX, KaK Jll1HfBl1CTH
lJ.eCKJ1X, TaK J1 3KCTpanHHfBJ1CTH'l.eCKJ1X, 6a3Hp)'IOI.I.\HeCJI Ha om1CaHJ1J1X
TeKCTOB 11 KopnycoB B pycrre npoeKTOB Text Encoding Initiative (TEI),
ISLE Project (International Standards for Language Engineering) l1 Ha pe
KOMett.n;aQ11JJx EAGLES (Expert Advisory Group on Language Engineering
202
LA FILOLÓGICA POR LA CAUSA
Standards). CpeAH HHX B nepBy10 oqepeAb cneAyeT HaJBaTb CDIF (Corpus Document Interchange Format, www.natcorp.ox.ac.uk/archive/vault/ tgcw30.pdf), CES (Corpus Encoding Standard, http://www.cs.vassar.edu/ CES/CESl.htmlContents), XCES (Corpus Encoding Standard for XML, http://www.xces.org/).
3Tl1 l1 Apyrne CTaHAapTbl B HaCTO.Rll.lee BpeWJ «C06Hpa!OTCJI» H 0606-ll.la!OTCJI noA 3rHAOH KOMHTeTa Me)l(AyttapOAHOH opramrnaQHH no CTaHAapTH3al.\HH ISO/TC 37 Mttorne l13 HHX HanpRMyio OTHOCRTCR K Kopnycttotf nHHfBHCTHKe, KaK-To: ISO 24614-1:2010. Tioc110BHaR cerMeHTal.\HR nHCbMeHHbIX TeKCTOB. qacTb 1. OcttoBHbie KOHQenQHH l1 06U1He npHHQHnbI, ISO 24610-1:2006. CTpyKTypb1 311eMeHTOB. qaCTb 1. TipeACTaBnettHe CTPYKTYPbI 311eMeHTOB AaHHbIX, ISO 24610-2:2011. CTpyKTYpbI 3JieMeHTOB. qaCTb 2. On11caHHe CHCTeMbI 3neMeHTOB AaHHbIX, ISO/ DIS 24611. Mopcl>ocHHTaKCH'leCKM pa3MeTKa, ISO 24613:2008. CxeMa 11eKCH'leCKOH pa3MeTKl1, ISO 24615:2010. CHcTeMa Cl1HTaKCl1'iecKoro aHHOTHpoBaHHR (SynAF) 11 AP· 3TH CTaHAapTbI noA 06U1HM Ha3BaHHeM « YnpaB11ett11e nHHfBHCTH'leCKHMH pecypcaMH» onHCb1Ba10T:
• npHHQHnbI, MeTO,!\bl CTaH,!\apTH3aQHH TepMl1H0110fl1H; • pa3pa60TKY TepMl1H0110fl1t{eCKHX CTaHAapTOB; • TepMHH0110fl1'1eCKHe cnoBapH; • C03,!\aHHe Jl3bIKOBblX pecypcoB; • KOMnblOTepttyio 11eKCHKOrpacl>H10; • TepMHHOnornqecKy10 AOKyMeHTaQl110; • KOAHpoBaHHe B 0611acT11 TepM11Honorn11 l1 1111HrBHCTH'leCKHX pe
cypcoB; • HCnOllb30BaHHe TepMHH0110rHl1 J1 ApyrHx Jl3bIKOBbIX pecypcoB
B Jl3bIKOBOH HH)l(ettepHH H ynpaBneHl1l1 KOHTeHTOM.
3. CTanp;apTbI npoeKTa Text Encoding Initiative
HaH6011ee npopa6oTaHbI peKOMeHAal.\HH npoeKTa Text Encoding Initiative (TEI). Haqa110 npoeKTa no C03AaHHIO cHcTeMbI KOAHpoBaHHR TeKCTOB CB.R3aHO c ceMHHapoM B BaccapcKOM Konne,l\)l(e B 1987 r., Ha KOTOpOM npHCYTCTBOBalIH npeACTaBHTelIH TeKCTOBblX apXHBOB, Hay'iHbIX 06U1eCTB l1 HCClle,!\OBaTellbCKKX l.\eHTpOB. L(ellblO BCTpeq11 6blll0 o6cy)l(AeHHe B03MO)l(HOCTH C03AaHHR CTaHAapTttotf cxeMbI KOAHpoBaHM TeKCTOBbIX AOKyMeHTOB. B 1988 r. TEI crnpTOBan co6cTBeHHO KaK npoeKT.
C11cTeMa TEI AaeT peKOMeHAaQHH no 3lleKTpOHHOH ny611HKaQHl1 TeKCTOB (HAeJ-ITH<l>HK~HR TeKCTa, npeACTaBneHHe, aHalll13 l1 HHTepnpeTaQHJI,
203
LA FILOLÓGICA POR LA CAUSA
MeTCUI3bIK orrncaHIDI 11 KOA11POBK11). Otta, B OCHOBHOM, paCC'll1TaHa Ha
TeKCTOBbie AOKyMeHTbI, HO TaIOKe npeAOCTaBJUleT B03MO)f(HOCTb on11ca
Hl1R: 11 11AeHrn<i>11Ka1.11111 AaHHbIX Apyr11X <PopMaToB, ttanp11Mep, rpa<P11K11
11 3BYKOBblX MaTep11a110B. fnaBHaJI 1.1e11b npoeKTa - pa3pa6oTKa <PopMaTOB
AJUI o6MeHa AaHHbIMl1 B ryMaH11Tapttot1 0611acT11.
PeK0MeHAa1.11111 TEI np113BaHbI:
1) orrpeAelll1Tb eA11HbIH Cl1HTaKCl1C <PopMaTa;
2) onpeAe1111Tb MeTCUI3bIK AJUJ on11caHIDI cxeM npeACTaBJieHIDI 11 KO
Al1POBaHIDI AaHHbIX;
3) on11caTb cyll.leCTBYIOll.111e cxeMbl KOA11pOBaHIDI Ha MeTCUl3bIKe npo
eKTa;
4) npeAJIO)f(llJTb MHO)f(eCTBO cxeM OIIl1CaHIDI AJUI pa3Hb!X AaHHbIX
11 pa3HhIX 3aAa'I;
5) o6ecrre'll1Tb MaKC11MallbHYIO COBMeCTl1MOCTb c cyll.leCTBYIOll.111Ml1
CTaHAapTaM11;
6) IIOMep)f(11BaTb KOHBepc1110 cxeM KOA11POBaHl1R: cyll.leCTBYIOlllHX
Mallll1HO'll1TaeMhlX TeKCTOB B Cl1HTaKCl1C HOBOro <PopMaTa 6e3 A0-
6aBJieHl1R KaKOH-111160 HOBOH 11H<PopMa1.11111 B 3Tl1 TeKCThI;
7) 06ecrre'l11BaTh B03MO)f(HOCTh 11cno11b30BaH11R AaHHhIX B <PopMaTe
TEI 6e3 crre1.111a11hHOro nporpaMMHoro 06ecne'leH11R.
OcttOBHbie noltRT1111 11 cTpyKTypa TEI npaKT11'1ecK11 He 113MeHR1111cb
Ha rrporn)f(eH1111 np11MepHo AeCRrn neT. TpeTbR BeprnR TEI (TEI P3)
6hrna ony61111KOBaHa B 1994 r., AOilOJIHeHa B 1999 r. Y.eTBepTCUI Bepc11R
TEI (TEI P4), C03AaHHCUI B 2001 r., npeACTaBJUJJia co6ot1 He6011h1IIOe AO
no11HeH11e, CBR3aHHoe c BHeApeH11eM R3hIKa XML (Guidelines ... -3JieK
TpoHHCUI Bepc11R). TiocneAHRR BeprnR: peK0MeHAa1.111M: (TEI PS) 6brna
ony6n11KOBaHa B 2005 r . .5brn11 A06aB11eHbI tta6opb1 6a30BhIX TeroB AJUJ
HOBbIX THIIOB AOKyMeHTOB, cpeACTBa on11caHID1 <P113H'leCKOfO COCTORHl1R:
AOKyMeHTa (B 'laCTHOCTl1, pyKon11cet1:) 11, '!TO cyll.leCTBeHHO AllR 1111ttrn11-
CTOB, yKa3aHIDI no npeACTaBJieHl1!0 1111HfBl1CTl1'1eCKOfO on11caH11R AM
KoprrycoB TeKCTOB. ,D,a11bHeH1IICUI pa6orn HaA peKOMeHAaQIDIMl1 TEI npo
AOJI)f(aeTcR:. TI11aH11pyeTcR yAe1111Tb 6011b1IIe BH11MaH11R TaK11M cTOpoHaM
npeACTaBJTel-11111: TeKCTa, KaK rpaMMaT11'1eCKCUI pa3MeTKa, l1CTOp11'1eCKOe
on11catt11e, on11catt11e <P11311qecKoro cocTORHl1R: AOKyMeHTa, a TaIOKe npo
AOJI)f(l1Th pa3pa6oTKy 6a30Bb!X Ha6opoB TeroB AJUJ pa3JTJ1'1Hb!X R3b!KOB
11 Tl1nOB AOKyMeHTOB.
TEI nOAAep)f(11Ba10T TaKHe Me)f(AyHapOAHb1e opratt113a1.11111, KaK
Association for Computers and the Humanities (Acco1.111a1.1ID1 no KOMilhlO-
204
LA FILOLÓGICA POR LA CAUSA
TepaM w ryMaimTapHbIM HayKaM), Association for Computational Linguistics (Accou;wau;wH no Bwmc1rnTenhHOH nwHrBHCTHKe) w Association for Literary and Linguistic Computing (Accou;11au;1rn no KOMnhIOTepHbIM TeXHOnomRM B nHTepaType H nHHfBHCTHKe).
Ilpou;e.z:1ypy onwcamui: TeKCTOB Ha3bIBaIOT pa3MeTKOH HnH KO.z:IHPOBaHHeM. JI106oe npe.z:1crnBneHwe TeKcTa Ha KOMnhIOTepe wcnonh3yeT TY wnw HHYIO <PopMy pa3MeTKH; O.z:IHOH H3 npH'IHH pa3pa60TKH rncTeMbI TEI 6brno cyw;ecTBoBaHwe orpoMHoro KonwqecTBa B3aHMHO HeCOBMeCTHMbIX CHCTeM KO,[IHpOBaHHJI, a TaIOKe ysenwtJeHHe o6nacTeH wcnonb-30Bamrn 3neKTpOHHbIX TeKCTOB .
.UM onpe.r1eneHHH cxeM&I KO.z:IHpoBaHw.11 wcrronh3YIOTCR R3bIKH SGML w XML, II03BOAAIOw;we <PopManhHO onpe.z:1enHTh cxeMy KO.z:IHpoBaHHR B TepMHHax 3neMeHTOB H aTpw6yTOB, a TaIOKe c IJOMOlll;blO npaBHn, yrrpaBAAIOW:HX HX pa3Mew;eHHeM B TeKCTe.
3.1. CTpyKTypa TeKcTa Text Encoding Initiative
Bee MeTKH TEI rrpwMeHHTenhHO K KopnycaM MO)f(HO OTHeCTH K pa3-nH'IHhIM rpyrrnaM, B qacTHOCTH: Mern.r1aHHb1e, CTPYKTYPH&Ie 3neMeHThI TeKCTa, crreu;wanhHaR (nHHfBHCTH'leCKaR) MeTawH<PopMau;w.11.
,UoKyMeHT B <PopMaTe TEI HMeeT .z:IBe OCHOBHbie qacTw: 3aronoBOK (3neMeHT <teiHeader>) H co6cTBeHHO TeKCT (<text>). 3aronOBOK - 3TO <PaKTHtJeCKH 3neKTpOHHaR BepcwH THTyn&Horo nwcTa. OH MO)l(eT co.z:1ep)l(aTh TaKYIO 1rn<PopMau;w10, KaK 6w6nworpa<PwqecKwe .z:1aHH&1e HCTO'IHH -Ka, CBe,[leHH.H o KO.z:IHPOBKe, He6w6nworpa<PwqecKoe orrwcaHwe w )l(ypHan 11cnpaBneHHH.
TeKcT B <PopMaTe TEI MO)l(eT 6hITh MOHonHTHbIM (oT.z:1enhHOe npoH3Be.z:ICHHe) HnH ofr&e,[IHHeHHbIM (c6opHHK). J1 B TOM, H B .z:1pyroM cnyqae TeKCT MO)l(eT HMeT& BBO.z:IHYIO qacT&, ocHOBHYIO w 3aKnIO'IHTen&HyIO. B cnyqae o6'be,[IHHCHHOfO TCKCTa OCHOBHaR 'laCTb MO)l(eT COCTO.HTb '13 rpyrrrr, Ka)l(,[laR '13 KOTOpbIX, B CBOIO oqepe,[lb, MO)l(eT co.r1ep)l(aTb rpynrr&I HnH TeKCTbl.
OcHOBHaR qacT& .z:1enwTc.11 Ha a63au;b1 ( <p> ), pa3.z:1en&1 (<div>) w rro.z:1-pa3.z:1en&1 ( <divn>, r.z:1e n o6o3HatJaeT ypoBeHb rro.z:1pa3.z:1ena w MO)l(eT 6&ITb OT 1 110 7). ,Upyrne 3neMeHTbI CTPYKTyp&I TCKCTa B TEI 3TO 3aronoBKH, rrpwMetJaHH.H, HOMepa CTpOK H CTpaHHU: H T. A·
KpoMe Toro, TEI .r1aeT B03MO)l(HOCTh BbI,[leneHHH oT.r1en&HbIX 3neMeHTOB TeKCTa c yKa3aHHCM npH'IHHbl BbI,[leneHH.H:
<emph> <Ppa3a, BbIAeneHHaR c u;en&IO nonyqeHHR nHHrBHCTH'le-CKoro HnH pinopwqecKoro 3<P<l>eKTa;
205
LA FILOLÓGICA POR LA CAUSA
<foreign> - CJIOBO lillllil cppa3a Ha HHOCTpaHHOM Jl3hU<e (He Ha TOM, Ha
KOTOpOM HanHCaH OCHOBHOH TeKCT);
<term> - CJIOBO HJIH cppaaa, paccMaTpHBaeMble B TeKCTe KaK TeXHH
qecKlilH TepMHH;
<title> - Ha3BaHHe npOH3BeAeHHJI (KHHrn, CTaThH, )l(ypHaJia HT. n.).
B TEI COAep)l(HTCJI noApo6Hei'1wan paapa6oTKa paaMeTKH CaMhIX
paanHqHhrx TeKCTOB mm HX cocTaBHhIX qacTei'1. B qacTHOCTH, 3TO no-
3THqecKHe, cQeH11qecKHe TeKCThI. ycTHan peqh, cnosap11, pyKonttcH,
cpaKTorpacpttqecKHe 6aab1 AaHHhIX (HMeHa, AaThI, JIHQa, reorpacp111qecKHe
Ha3BaHHJI H T. n.), Ta6JIHQhl, cpopMyJihI H rpacpl!IKH, rpacpbl, cxeMbl, Ae
peBhSI H AP· 0TAeJibHO o6cy)l(AaIOTCJI sonpochI KOAHPOBKH AaHHhIX AJIJI
pa3HhIX Jl3hlKOB.
3.2. PeKoMeHA3J4HH no C03A3HHIO .R3bIKOBbIX Kopnycos
Oco6o CJieAyeT ocTaHOBl!!ThCJI Ha paaMeTKe JI3hIKOBhIX Kopnycos.
B TEI JI3hIKOBhIMl!I Kopnyca.MH Ha3hIBaIOTCJI cocmaBHble Kopnycbl, m. e. eOUHble i<ellbHocmu, cocmoH~ue U3 MHOJICecmBa meKcmoB. 3To 06'b11c
HJ1eTrn TeM, qTQ, xorn K~blH OTAeJibHhlH cpparMeHT TeKCTa B Kopnyce
HMeeT npaso cqHTaTbCJI caMOCTOR:TeJibHhlM TeKCTOM, B HayqHhlX QeJIJIX
K3)1(Ahlff cpparMeHT paccMaTpl!!BaeTCJI KaK COCTaBJIJllOIQaJI 60Jibllle
ro o6'beKTa. Kopnycht H Apyrne THilhI cocTaBHhIX TeKCTOB (HanpwMep,
aHTOJIOrnH J1 c6opHHKH) li!MeIOT MHOfO o6w.ero. TipHMeqaTeJibHO, qTQ
pa3Hble KOMilOHeHThl COCTaBHhIX TeKCTOB MOryT HMeTb pa3Hhle CTpyK
TYPHhle xapaKrepHCTHKH (Hanp111Mep, AOnycKaeTca o6'beAHHettwe B Kop
nyce CTHXOB H npoaa11qecKHX TeKCTOB), nplil 3TOM pa3Hble KOMilOHeHTbl
o6cny)l(HBaIOTCR: 3JieMeHTaMl!I pa3HhIX MOAynei'1 TEI. TioMHMO ocHOBHhIX Teros TEI npeAnaraerca pRA cneQHaJIH3HposaH
HhIX Ha6opoB TeroB AJIJI pa60TbI c KOpnycaMlil.
PaCCMOTplilM OCHOBHble Tern Iii B03MO)l(HOCTlil CTaHAapTa c TO'IKlil ape
HHJI MHOroo6pa3lilJI THilOB KOpnycoB J1 pewaeMhIX B KOpnycttm1: JIJ1HfBlil
CTlilKe 3aAa'I.
)J;JIJI opraHH3a~lill1 OCHOBHhlX ypoBHeH KOpnycon npeAHa3HaqeHbl
CJieAyIOlll,He Tern:
<teiCorpus> COAep)l(HT BeCb KOpnyc, 3aKOAHPOBaHHbIH D cpopMa
Te TEI; KOpnyc COCTOHT 113 aaroJIOBO'IHOro Tera Kopnyca J1 OAHOro J1JIH
HeCKOJibKHX Teros TEI, K3)1(AhIH lil3 KOTOphrx COAep)l(HT aaronoBO'IHhIH
Ter TeKCTa Iii caM TeKCT;
206
LA FILOLÓGICA POR LA CAUSA
<TEI> (,n:oKyMeHT TEI) - co.z:1ep)Kl1T o,n:11H ,n:oKyMeHT, coBMeCTl1MbIH
c <l>opMaTOM TEI; 3TOT ,n:oKyMeHT cocT011T 113 3arorroBoqHoro Tera TEI 11 TeKCTa;
<teiHeader> (3arorroBoqHb1J1 Ter) - co,n:ep)l(11T ormcaH11e TeKcTa
11 11H<l>opMau;mo 0 ero ,n:eKrrapau;1111 B Bl1,[\e 3JieKTpOHHOH CTpaHl1ll;bI, KO
Topaa pacnorraraeTc.11 nepe.n: HaqarroM Ka)l(JJ:Oro TeKcTa, coBMecT11Moro
c <PopMaTOM TEI; ~ - yKa3bIBaeT Ha Tl1Il ,n:oKyMeHTa, K KOTopoMy OTHOCl1TC.ll 3TOT
3aroJIOBOqHbIH Ter (He3aBl1Cl1MO OT Toro, ,n:oKyMeHT - 3TO Kopnyc 11Jil1
OT,U:eJibHbIH TeKCT);
<text> co,n:ep)l(11T OJJ:l1H TeKCT mo6oro T11na, u;errbHbIH 11Jil1 co-
CTaBHOH, Hanp11Mep, Il03MY 11Jil1 nbecy, ll;l1KJI 3CCe, poMaH, CJIOBapb 11Jil1
<l>parMeHT Kopnyca;
<group> - co,n:ep)l(11T COCTaBHOH TeKCT, KOTOpbn1: COCTOl1T 113 pa3-
n11qHbIX TeKCTOB (rpynn TeKCTOB), KOTOpbie no KaKOH-TO np11q11He pac
CMaTp11BaIOTC.ll KaK e,n:11ttoe u;erroe, Hanp11Mep, TeKCTbI OJJ:HOro aBTopa,
CTHXOTBOpHbIH ll;l1KJI 11 T. JI:. Ter <teiCorpus> npe,n:Ha3HaqeH ,n:rr.11 Ko,n:11poBKl1 o6'heMHbIX Kopny
COB, HO MO)l(eT OKa3aTbC.ll norre3HbIM 11 np11 KO,U:11pOBKe ra3eT, KOMilbIO
Tep11311poBaHHbIX aHTorrornJ'.1 11 npoq.KX 06'be,n:11tteHHbIX TeKCTOB. OT
.z:1errhHh1e qaCTl1 Kopnyca KOJJ:l1PYIOTC.ll OT,U:eJibHbIM11 TeraMl1 <TEI>, a Bech
Kopnyc 3aKrrJOqeH B Ter <teiCorpus>. Ka)l(JJ:M qacTb Kopnyca 11MeeT
cTaH,n:apTHYIO CTPYKTYPY ,n:oKyMeHTa, 3aKJIJOqeHHoro B Ter <TEI>: 3aro
rroBoqHbIH Ter <teiHeader> 11 crre.z:1yI0iu;11J1 3a Hl1M Ter <text>. CaM Kop
nyc TaK)l(e 3aKJIJOqeH B Ter <teiHeader>, B KOTOpOM MO)l(eT 6hITb on11caH
11 caM Kopnyc, 11 cnoco6 Ko,n:11poBK11 pa3HbIX •1aCTeJ1 Kopnyca.
J1H<l>opMau;.11.11, KOTopaa pacnorro>KeHa BHYTPl1 3arorroBoqHoro Tera
11 OTHOCl1TC.ll KO BCeMy Kopnycy, a He K ero OT,U:eJibHbIM KOMilOHeHTaM,
,n:orr>KHa co,n:ep)l(aTbC.11 BHYTPl1 Tera <teiHeacler>, nepe.n: BHyTpeHHl1M11
TeKcTaM11 Kopnyca. TaKaa ,n:ByxypoBHeBaa CTPYKTypa no3Borr.11eT ,n:o-
6aB11Tb MeTa11H<l>opMau;11IO Ha ypoBHe Kopnyca, Ha ypoBHe OT,[\ellbHOro
TeKCTa HJil1 Ha 06011x ypoBH.llX cpa3y.
TeKcm TpaKTyeTc.11 KaK rrI06oe peqeBoe npo113Be,[lett11e, 3aKoHqeHHOe
11Jil1 He3aKOHqeHHOe, u;errbHOe 11Jil1 COCTaBHOe, KOTOpoe paccMaTp11BaeT
C.ll KaK e.z:111ttoe u;erroe. TepM11H «COCTaBHOH TeKCT» TpaKTyeTc.11 KaK TeKCT,
BHYTPH KOToporo co.z:1ep)l(aTc.11 ,n:pyrne TeKCTbI.
nepeq11crreHHbie BbilUe Tern MO)l(HO KOM611H11pOBaTb All.II KOAHPOBKH
BCeB03MO)l(HbIX COCTaBHbIX KOpnycoB pa3HbIMl1 cnoco6aMl1.
207
LA FILOLÓGICA POR LA CAUSA
KoMnotteHTbI KopnycoB - caMOCTORTenbHbie TeKCTbI, OAHaKo 3a'la
CTYIO YA06Ho C'IHTaTb Kopnyc eAHHbIM QenbIM: no ynpon_\aeT cpopMH
poBaHHe Kopnyca H pa3MeTKy. TaKHM o6pa30M, MO)l(HO paccMaTpHBaTb
KOpnyc KaK oco6bIH ceMMOTH'leCKHH o6'beKT co CBOHMH ceMaHTHKOH,
CHHTaKCHCOM H nparMaTHKOH.
B HeK0Topb1x cny'laRX, 3aMb1cen Kopnyca HaxOAHT oTpa)l(eHHe B ero
BHYTpettttel1 cTpyKType. HanpHMep, Kopnyc oTpbIBKOB ra3eT MO)[(eT
6bITb opraHH30BaH TaK, 'ITO OTpbIBKH CTaTeH crpynnHpOBaHbl no THnaM
(penopTa)l(H, peAaKrnpcKHe CTaTbH, o63opbI HT. A.), a BHYTPH Ka)l(AOro
THna npHcyTcTByeT AOnonHHTenbHaR KnaccHcpHKaQHR no AaTe, MecTy
ny6nHKaQHH H T. A·
EcnH HY)l(HO noKa3aTb, 'ITO Kopnyc cocTOHT H3 pRAa noAKopnycoB,
TO CaM Kopnyc HnH nOAKOpnyc 6onee BbICOKOfO ypOBHR MO)l(HO npeACTa
BHTb KaK COCTaBHOH TeKCT c nOMOll.\blO Tera <group> AAA o6'beAHHeHHbIX
TeKCTOB. 06'be,D.HHHTb KOMnOHeHTbl MO)l(HO TaIOKe c noMOll.\blO TeroB
AAA KnaccHcpHKaQHH TeKCTOB.
3a'laCTYIO aHTOnOrHH H c6opHHKl1 06pa6aTb1BaIOTCR KaK caMOCTOR
TenbHble TeKCTbl xorn 6bi no npH'IHHe CBOeH HCTOpH'leCKOH QenOCTHO
CTH. 0AHaKO B03MO)l(H0, 'ITO qacTH aHTOnOrHH noTpe6yeTCR o6pa6aTbl
BaTb 11 KaK caMOCTORTenbHhie o6'beKTbI H3y'leHHR. Bee no npeACTaBneH
HbIM CTaHAapTOM o6ecne'IHBaeTCR.
TaKHM o6pa3oM, Ter <group> rrpeAHa3Ha'leH AAA ynpon_\eHHJI KOAH
poBKH c6opttHKOB, attTonornw H QHKnoB. KaK 6bmo OTMe'letto Bbuue,
3TOT Ter MO)l(HO TaK)[(e Hcnonh30BaTb ,D.nR OTpa)l(eHHJI Toro, 'ITO Kopnyc
COCTOHT l13 nO,D.KOpnycoB.
nAA BCeX COCTaBHbIX TeKCTOB 06ll.\aR xapaKTepHCTHKa cneAyIOll.\aR:
Bee TeKCTbl, H3 KOTOpbIX OHH COCTORT, MOryT, HO He 06113aHbI HMeTb eAH
Hoo6pa3HYIO CTPYKTYPY· EcnH Bee BHYTpeHHHe TeKCTbI KOAHpy10Tc11 npH
nOMOll.\l1 OAHOro MOAynR, He B03Hl1KaeT HHKaKOH npo6neMbl. 0,D.HaKO,
ecnH AnR HX KOAHPOBKM Tpe6y10TCR pa3Hbie MOAynM, Bee 3TH MOAynH He
o6xOAHMO A06aBHTb B cxeMy onHcaHHJI.
3.2.l. BttyTpeHHIDI CTPYKTypa TeKcTa
OnHcaHHe co6cTBeHHO TeKcTa opraHH30BaHo KaK Ha6op TernpoBaH
HbIX H HeTernpOBaHHbIX xapaKTepHCTHK no onpeAeneHHbIM CHTyaQHOH
HbIM napaMeTpaM (MeTaAaHHbIM), Ka)l(AbIH H3 KOTOpbIX o6cny)l(HBaeTCR
CBOHM TeroM c ero aTpH6yTaMH:
208
LA FILOLÓGICA POR LA CAUSA
<channel> (opKrnHanbHbIH KaHan rrepeAa'IK ):\aHHbIX) - orrKCbIBa
eT KaHan rrepe}:\a'IK ):\aHHbIX, c IlOMOIQbIO KOTOporo 6bm rronyqeH TeKCT;
):\n.R IlKCbMeHHblX ):\aHHblX B03MO)l(Hbl BapHaHTbI: rre'laTHblff TeKCT, pyKo
IlHCb, 3neKTpOHHOe IlHCbMO; ):\nR ycTHblX ):\aHHbIX: paAHOnepe}:\a'la, Tene
<t>oHHblff pa3rosop, 3arrHCb pa3rosopa;
@mode - yKa3bIBaeT Ha ycTHYIO HnH IlHCbMeHHYIO <l>OPMY AaHHbIX;
<constitution> - OilHCbIBaeT BHyTpeHHee CTpOeHHe TeKCTa HnH
ero qiparMeHTa c TO'IKH 3peHHR Toro, qiparMeHTapHbIH nH OH, 3aseprneH
HbIH HT.):\.;
~ - YTO'IHReT, H3 qero COCTOHT TeKCT;
<derivation> - OilHCbIBaeT CTeneHb ayTeHTK'IHOCTH TeKCTa;
@type - OilHCbIBaeT, KaKHe H3MeHeHHR rrpeTeprren TeKCT;
<danain> - B o6IQKX qepTax OilHCbIBaeT o6CTORTenbCTBa, B KOTOpbIX
6bm peanH30BaH TeKCT, H ayAKTOpHIO, KOTopoM: OH rrpe}:\Ha3Ha'lancR: 6bm
nH TeKCT peanK30BaH B JIH'IHOH 6ece):\e HnH rrepeA rry6nHKOH, B yqe6HOH
ayAHTOpHH HnH Ha penHrK03HOM MeporrpHRTHH H T. ):\.;
@type - copTHpyeT TeKCTbl no o6CTORTenbCTBaM, rrpH KOTO
pbIX OHH 6bmH peanH30BaHbI;
<factuality> - OilHCbIBaeT CTerreHb peanKCTH'IHOCTH CO}:\ep)l(aHHR
TeKCTa: OilHCbIBaeTCR nH B TeKCTe BblMbilllneHHblff MKp HnH peanbHbIH;
@type - pacrrpeAeJIReT TeKCTbl no CTerreHH peanHCTH'IHOCTH;
<interaction> orrHCbrnaeT H xapaKTep peqesoro aKTa, B pe3ynb-
TaTe KOToporo 6bm C03):\aH HnH BOCilpOH3Be):\eH TeKCT: 6brn nH TeKCT OT
BeTOM, BOCKnHQaHHeM, KOMMeHTapHeM H T. ):\.;
@type yTO'IHReT cTerreHb B3aHMOAeHCTBHR Me)l(AY aKTHB
HbIM H rraCCHBHblM yqaCTHHKaMH peqesoro B3aHMO):\eHCTBHR;
ttactive coo6IQaeT 'IHCno aKTHBHbIX yqacTHHKOB peqeso
ro aKTa (addressors- aApecaHTOB);
~ssive - coo6IQaeT 'IHrno naccHBHblX yqacTHHKOB peqe
soro aKTa (addresses- aApecaTos);
<preparedness> - coo6IQaeT, TeKCT IlO):\fOTOBneH HnH CilOHTaHHbIH;
@type - COAep)l(HT KnIO'leBoe cnOBO, yKa3bIBaIOIQee Ha CTe
rreHb IlO):\fOTOBneHHOCTH TeKCTa;
<purpose> coo6IQaeT Qenb HnH KOMMYHHKaTHBHYIO <l>YHKQHIO
TeKCTa;
@type - KOHKpeTH3HpyeT 3TY Qenb.
Hy)l(HO OTMeTHTb, 'ITO HeKOTOpbre TeKCTbI, B rrepsy10 oqepeAb xy
AO)l(eCTBeHHble, He TaK nerKO IlO):\):\aIOTCR IlOA06HOH napaMeTpH3aQHH.
209
LA FILOLÓGICA POR LA CAUSA
B TaKHX cnr1aJ1X Tery, OTBeqalOll\eMy 3a «CnopHblH» napaMeTp, np11-
nHCb1BaeTC1l coAep)l(aH11e «He np11MeHHMO» 111rn: ApyraR cppaaa c TeM )l(e
CMblCnOM.
ilOCKOnbKY CYll\eCTByeT MHO)l(eCTBO Cl1CTeM Knacc11cp11Kal\Hl1 l1 Onl1-
CaHl1Jl TeKCTOB, Ka)l(Ablili TeKCT MO)l(eT 6bITb on11caH no MHOfl1M napaMe
TpaM. HeCMOTpR Ha CTapaHl1Jl MHOrnx KOpnyCHblX nHHfBHCTOB, TeKCTO
noros, COJ..\HOnHHrBHCTOB, nHTepaTypoBeAOB, ell\e He HaHAeH KOMnpo
MHCC no sonpocy 0 TOM, CKOnbKO l1 KaKHe CHCTeMbl HY)l(HO )"lffTbIBaTb.
BffAHMO, npasffnbHee scero, BMecTo Toro qT06b1 nbITaTbCR BbIBeCTff
YHHKanbHYIO c11cTeMaT11Ky TeKCTOB AnH Toro, qT06b1 onffcaTb ornffqH
TenbHbie xapaKTepffCTHKff TeKCTa, AOCTaToqHo l1Cnonb30BaTb B pa3-
n11qHbIX KOM6HHal\HJIX KOHeqHbIH Ha6op CffTyal\ffOHHbIX napaMeTpOB,
onffCaHHbIX BbJWe, 6e3 npHBR3Kl1 K onpeAeneHHOH TeKCTonornqecKOH
CffCTeMaTffKe. 0AHOBpeMeHHO cneAyeT yqffTbIBaTb TffilbI TeKCTOB, '!T06bI
B CO'!eTaHffff c npeAnO)l(eHHbIMff 3AeCb napaMeTpaMH HaH6onee npocTO
ff aAeKsaTHO on11caTb sHyTpeHHIOIO CTPYKTYPY Ka)l(AOro TeKcTOnornqe
cKoro T11na. TaKOH nOAXOA ffMeeT cneAyIOll\ffe aHanffTff'!ecKHe npe11My
ll\eCTsa:
• OH 06ecneq11saeT OTHOCHTenbHO nocneAOBaTenbHOe ff eAffH006pa3-
HOe on11caHffe TeKCTOB;
• 06ecneq11saer ffHTepnpeT11pyeMb1e conocrnsneH11H pa3HbIX cppar
MeHTOB Kopnyca;
• n03BOnHeT aHanHTHKaM C03AaBaTb l1 cpaBHHBaTb HOBbie THCTbI TeK
CTOB B COOTBeTCTBffl1 c KOHKpeTHbIMff napaMeTpaMH, npeACTaBAA
IOll\l1Ml1 HaH60nbWHH HHTepec;
• OH OAffHaKOBO npffMeHffM KaK K ycTHbIM, TaK l1 K nHCbMeHHbIM AaH
HbIM.
3aqacTYJO YA06Ho C'IffTaTb, qTO cnel..\11cp11qecKffH Ha6op Teros, KOTO
pb1e o6cny)l(HBaIOT CffTyal..\HOHHbie napaMeTpbI, cpopMHpyeT caMOCTO
RTenbHbIH THn TeKCTa. 3TO TaK)l(e MO)l(eT OKa3aTbCR yMeCTHbIM, KOrAa
OAHH l1 TOT )l(e Ha6op xapaKTepHCTHK cpHrypffpyeT B onttcaHHJIX pa3HbIX
TeKCTOB, COCTaBnHIOll\HX Kopnyc. MHO)l(eCTBO THnOB TeKCTOB, onpeAe
neHHbIX TaKHM o6pa30M, cne1weT C'!HTaTb oco6oH KopnycHOH CHCTeMa
THKOH.
Oco6o cneAyeT OTMeTHTb npopa6oTKY s TEI paaMeTKff Kopny
cos ycTHOH pe'IH. BHyTpH Tera <profileDese> MO)l(eT HaxOAHTbCH Ter
<particDesC>, KOTOpbIH o6cny)IG1BaeT AOnonHHTellbHYJO HHcpOpMal\HIO
o rosop.Rll\HX 11n11, ecn11 9TO HY)l(HO, o IIHJ..\ax, ynoMHHYTbIX HJIH o6cy)I(-
210
LA FILOLÓGICA POR LA CAUSA
AaeMbIX B nHCbMeHHOM TeKcTe. Hpt<HO OTMeTHTb, 'ITO, xoTJI ynoTpe61U1-
eTCJI TepMHH «ytJaCTHHK peqeBoro aKTa», IlOApa3yMeBaeTCJI, 'ITO cyIQe
CTBa, Ha.n;erreHHbie fOJIOCOM B TeKCTe, Oill1Cb!BalOTCJI no TOH )f(e cxeMe,
ecrrH He oroBopeHO HHOe. Y1AeHTHcp11u;11poBaHHblH nepcOHa)f( IlbeCbl J1JIJ1
poMaHa MO)f(eT C'IHTaTbCJI IlOJIHOnpaBHblM yqacTHHKOM peqeBOro aKTa.
Ecrr11 B rna6rroH A06aBrreHbI 3JieMeHTbI MOAYIIJI namesdates ( cM. THn
«Y1MeHa, AaTbI, IIlOAH, MeCTa» ), BHYTPH Tera <particDese> MO)f(eT COAep
)f(aTbrn IlOAp06HaJI HHcpopMaI.J;HJI 0 fOBOpJIIQeM J1JIJ1 rpynne fOBOpJIIQHX,
Hanp11Mep l1X HMeHa 11 Apyrne HHAHBHAyarrbHbie xapaKTepHCTHKH. Kor
Aa JIH'IHOCTb fOBOpJIIQero pacn03HaHa, eMy MO)f(HO npHCBOHTb KOA, KO
TOpblM roBopJIIQHH 6yAeT o6o3HatJaTbCJI B rr1060M KycKe KOAHpoBaHHoro
TeKcTa, HanpHMep KaK onpeAe1U1eMb1H 3rreMeHT aTp116yTa \'llo. ATp116yT
\'llo COAep)f(HT HHAHBHAyarrbHbie xapaKTepHCTHKH OAHOro HJIH HeCKOJib
KHX yqacTHHKOB.
Ter <settingDese> Hcnorrh3yeTcJI AJUI Toro, 'IT06b1 yKa3aTb, B KaKoYI
OKpy)f(alOIQeYI o6cTaHOBKe npoHCXOAHT petJeBoYI aKT. On11caHHe oKpy
)f(alOIQeYI o6cTaHOBKJ1 MO)f(eT 6bITb CBJ13Hb!M HeTernpoBaHHblM TeKCTOM
(KaK on11caHHe ocpopMrreHHJI cu;eHbI nepeA HaqarroM cneKTaKlUI). 0Ho )f(e
MO)f(eT 6bITb IlOAp06HblM H TernpoBaHHblM
EcrrH cpHrypHpyeT HeCKOJibKO onHcaHHH OKpy)f(alOIQeYI o6cTaHOBKH,
HCilOJib3yeTCJI HeCKOJibKO TeroB <setting>: <setting> - COAep)f(HT IlOAp06Hoe OilHCaHHe OKpy)f(alOIQeH o6cTa
HOBKH, B KOTOpoY! npoHCXOAHT peqeBOH aKT.
EcrrH yqacTHHKH peqeBoro B3aHMOAeHCTBHJI HaxOAJITCJI B pa3HbIX
MeCTaX, TO c IlOMOIQblO cpaKyJibTaTHBHOro aTp116yTa \'llc) (peaJIH3yeMOMY
B Tere <setting>, KaK H B rr1060M Tere MeToAa att. ascribed), pa3HbIM
yqaCTHHKaM MOryT 6b!Tb npHilHCaHbl OilHCaHHJI pa3Hb!X OKpy)f(alOIQHX
06CTaHOBOK.
Ilepeq11cJieHHbie KJiaCCbl AlUI peqeBOH CHTyau;HH peaJIH3yIDTCJI C IlO
MOIQblO CJieAylOIQHX TeroB:
<nane> (HM.II co6cTBeHHoe) - coAep)f(HT HM.II co6cTBeHHoe HIIH ero
TpaHCilOHHpOBaHHbIH aHarror;
<date> - coAep)f(HT AaTy (B rr1060M cpopMaTe);
<time> - COAep)f(HT cppa3y, yKa3blBalOIQYlO Ha BpeMJI AHJI (B rr1060M
cpopMaTe);
<locale> - COAep)f(HT KpaTKOe HeTernpoBaHHOe OilHCaHHe MeCTa,
fAe npOHCXOAHT pe'IeBOH aKT: B KOMHaTe, B pecTOpaHe, Ha CKaMeHKe
B napKe H T. A.;
211
LA FILOLÓGICA POR LA CAUSA
<activity> - COACP)t(HT KpaTKoe HeTernpoaaHHoe onHcaHHe Toro, 'ICM Y'iaCTHHK pe'IeBOfO aKTa 3aHHMaeTCJJ BO BpeMH pe'ICBOfO aKTa ( CCnJi OH 'ICM-TO 3aHHMaeTrn).
IlpH no,n;KnIO'ICHHH K rna6noHy Mo,n;ym1 namesdates cTaHOBR.TCR ,n;ocTynHbIMH ,n;ononHHTenbHbre cneQHanH3HpoaaHHbie Tern: <orgName> H <persName>.
,[{oKyMeHT, coBMeCTHMbui c <PopMaTOM TEI, MO)t(eT o6na,n;aTb HeCKOnbKHMH 3aronoBO'IHblMH TeraMH, TOnbKO ecnH OH npe,n;cTaBAAeT co-6ow Kopnyc, npe,n;cTaaneHHbIH a <PopMaTe TEI. Y1 caM Kopnyc, H ace TeKCTbI, ero <PopMHPYJOIQHe, o6R3aHbl HMCTb 3aronoBO'IHbie Tern. Ka)t(,D;bIH Ter, nonyqaIOIQHH cneQH<l>HKaQHIO B 3arOnOBO'IHOM Tere KOpnyca, aBTOMaTH'ICCKH pacnpocTpaHReT caoe ,n;ewcTBHe Ha Ka)t(AbIH BHyTpeHHHH TCKCT, ecnH OH TaM He nepeonpe,n;eneH. Ter, nonyqaIOIQHH CilCI.\H<l>HKaQHIO B 3aronoBO'IHOM Tere BHYTPCHHero TeKcTa, HO He BCTpe'IaIOIQHHCJJ B 3aronOBO'IHOM Tere acero Kopnyca, o6ecne'IHBaeT cneQH<l>HKaQHIO TOnbKO :noro BHyTpeHHero TeKcTa. EcnH Ter nonyqaeT cneQH<l>HKaQHIO H B 3aronoBO'IHOM Tere Kopnyca, H B 3aronoBO'IHOM Tere BHyTpeHHero TeKCTa, TO CilCI.\H<l>HKaQHR Tera B 3aronOBO'IHOM Tere Kopnyca HrHOpHpyeTCR..
Bee aTpH6yTbr 3aronoao'IHbIX Teroa ae,n;yT ce611 aHanorn'IHbIM o6-pa30M. Bnaro,n;apR TaKOH CHCTeMe ,D;OCTaTO'IHO TOnbKO O,D;HH pa3 BBCCTH MeTaHH¢opMaQH.IO, o6IQyIO ,D;AA BCCX TCKCTOB KOpnyca, H ,n;o6aBAATb OT,n;enbHO HH<l>opMaQHIO ,n;n11 Ka)t(,D;oro BHyTpeHHero TeKcTa, ecnH OHa oTnH'IaeTcR OT o6IQero 3HaMeHaTeAA.
3.2.2. MeTaHucpopMa~HH
MeTalrn¢opMaQIDI B cTaHAapTe TEI nonyq1rna Ha3BaHHe KOHTeKcTyanbHOH HH¢opMaQHH. IlpHMepaM11 ee cny)t(aT: B03pacT, non H reorpa¢H'IecKoe npoHCXO)t(ACHHe yqacTHHKOB peqeaoro aKTa, l1X COQHanbH0-3KOHOMH'ICCKHH CTaTyc; CTOHMOCTb H ,n;aTa ny6nHKaQHH ra3eTbl; o6IQaR. TeMaTHKa HnH BblXO,D;Hbie ,n;aHHbIC KHHrn H T. n. Y1H¢opMaQHR. TaKoro poAa o6na,n;aeT nepaocTeneHHOH Ba)t(HOCTbIO ,n;AA KopnycHow nHHfBHCTHKH. 0Ha BblCTynaeT opraHH3YIOIQl1M npHHQHilOM npH C03,n;aHHH Kopnyca (KaK, HanpHMep, B TOM cnyqae, Kor,n;a HY)t(HO npoaepHTb, 'ITO c TO'IKH 3peHIDI HCKOTopow xapaKTepHCTHKH pa3Mep BbI60pKH paBHOMepHO npeACTaBneH BO BCeM Kopnyce HnH npe,n;CTaBneH nponopl.\HOHanbHO 'IHCneHHOCTH <PparMeHTOB, B3R.TblX AllR. COCTaBneHHJl Kopnyca), KpHTepHeM BbI6opa ¢parMeHTOB npH IlOHCKe H npH aHan113e Kopnyca
212
LA FILOLÓGICA POR LA CAUSA
(KaK B TOM enyqae, KOfAa Tpe6yeTeJI H3yqJ1Tb eneu;m}mqeeKHe Jl3bIKO
Bbie xapaKTepHeTHKH npHMeHHTenbHO K HeKOTOpoMy eoo6m;eeTBy HnH
IlOAMHO)l(eeTBy TeKeTOB).
3Ta HH<i>opMan;HJI AOn)l(Ha 6bITb 3a<l>HKeHpOBaHa B eOOTBeTeTBy10-
m;eM pa3Aene 3aronoBoqttoro Tera TEI. MeTaHH<i>opMan;H.11 060 Beex AOKyMeHTax npeAeTaBneHa B OTAenbHOM <Patine e u;enb10 YA06eTBa Bhr6opa llOAMHO)l(eeTBa Kopnyea 110 011peAeneHHblM 11pH3HaKaM.
Ter MeTaonHeaHHJI AOKyMeHTa <teiHeader> HMeeT cneAy10m;He aTpH-6yThI:
1) id - YHHKanbHhrtl HAeHTH<i>HKaTOp AOKyMeHTa B Kop11yee ( o6hrqHo OH eOOTBeTeTByeT HMeHH <Patina 6e3 paeumpeHHJI; yqJ1TbIBaJI, qTQ OH eoeTaBAAeT oeHOBy AAA HAeHTH<i>HKaTOpOB enoB H npeAnO)l(eHHtl, MO)l(HO ero eoKpaTHTh AO YHHKanbHOro KOpOTKOfO HMeHH);
2) target - HM.II <Patina, B KOTOpOM HaxOAHTCJI AOKyMeHT; 3) type='text' - TJ111 onHeaHHJI, y Hae BeerAa <<text», MOryT
6bITb onHeaHHJI rpynn AOKyMeHTOB;
4) lang='ru' Jl3bIK, Ha KOTOpOM HanHeaH AOKyMeHT, y Hae BeerAa «ru», B TEI He11onh3yeTe.11 yKa3aHHe JI3bIKa no eTaHAap
TY ISO 639 ( aTpH6yT lang 3aAaeT 3HaqeHHe no yMon•taHUIO. 3TO 3HaqeHHe MO)l(eT 6bITb nepeonpeAeneHO AAA OTAenbHOfO npeAno)l(eHHJI HnH enoBa, eenH B pyeeKHtl TeKeT BKn10qeH <PparMeHT
Ha ApyroM R3bIKe, B TEI npeAyeMoTpeH TaK)l(e Ter <foreign> AAA J1HOJl3blqHbIX BeTaBOK).
Bee MeTaOllHeaHHe AOKyMeHTa eoeTOHT H3 eneAy10m;HX rpyn11 TeroB: 1) <fileDese> -HH<PopMan;H.11 o TeKeTe AOKyMeHTa;
2) <profileDese> -HH<i>opMal.l;HJI 0 )l(aHpe AOKyMeHTa;
3) <encodingOeso -HH<i>opMal.l;HJI 0 eTpyKType pa3MeTKJ1 AO
KyMeHTa (nH60 eebmKa Ha eTaHAapTHyio);
4) <revisiort>esc> -HH<i>opMal.l;HJI 06 HeTOpHH MOAH<i>HKal.l;HH
AOKyMeHTa. KpoMe <fileDese> MO)l(eT 6bITb none3eH <profileDese>, KOTOpbitl eo
Aep)l(HT HH<i>opMan;H10 06 o6m;eM Knaeee TeKeTOB, Ha11pHMep, XYAO)l(e
eTBeHHaJI nHTepaTypa, ny6nHn;HeTHKa, yeTHa.11 peqb J1 T. n.
011HeaHJ1e <Patina <fileDese> eoeTOHT H3 cneAy10m;HX 3neMeHTOB:
1) <titleStmt> - 6H6nHorpa<PJ1qecKa.11 HH<l>opMan;J1J1 o TeKeTe;
2) <publicationStmt> - 6H6nHorpa<l>HqeeKa.11 HH<l>opMan;J1J1 06 H3-
AaHHH;
213
LA FILOLÓGICA POR LA CAUSA
3) <sourceDese> - HH<l>opMaQHJI 06 HCTO"IHHKe, H3 KOTOporo nony
"!eHa 3neKTpOHHaJ1 BepcHJJ AOKyMettTa.
EH6nHorpa<l>H"lecKa.11 HH<l>opMaQH.11 <titleStmt> BKnJOqaeT 3ne-MeHThI:
• <title> - Ha3Batt11e;
• <author> - aBTop;
• <date> - AaTa C03AaHHJJ opHrnttanhHoro AOKyMeHTa;
• <extent> - pa3Mep AOKyMeHTa B HeKOTOpblX ycnOBHblX eAl1-
HHQax (HX THIIOnorm1 MO:>KeT 6hITb 3aAaHa B aTpH6yTe type, HO ecTeCTBeHHO C"ll1TaTb B cnoBax; HaAO c<1>opMyn11poBaTb
rrpaB11na AnR IIOAC"leTa cnoB, Harrp11Mep, MO:>KHO C"ll1TaTb cnoBOM nocneAOBaTenhHOCTb CHMBOnOB OT rrpo6ena AO npo6ena,
MO:>KHO, Hao6opOT, TOnbKO rrocneAOBaTenbHOCTH 6yKB 113 KJ1-
p11nnl1QbI-naTl1Hl1Qbl, MO:>KHO TOnbKO H3 KHp11nnHQbl, MO:>KHO
C"IHTaTb MHOrocnOBHbie eAHHHQbl, HarrpHMep, maK KaK, KaKHu6yOb, Hb10-filopK, opyz opyza 3a OAHO cnoBo; Y"ll1TbIBaJI, "!TO
Koprryc OQeHHBaeTCJJ, B TOM "IHCne H no An11He B cnoBax, Tpe6y
eTCJI TO"IHOe yKa3aH11e napaMeTpa);
• <sponsor> - 3neMeHT, B KOTOpOM Mbl MO:>KeM cocnaTbCJI Ha co
OTBeTCTBy10w;ero cnoHcopa;
• <respS~mt> - 11tt<PopMaQHJJ o qenoBeKe/nJOAJJX, Bttecumx
HHTenneKTyanbHbUi BKnaA B C03AaHHe :noro 3neKTpOHHO
ro AOKyMeHTa (He aBTOpbl 11 cnOHCOpbl); <respStmt> 3aAaeT
HH<l>opMaQHJO c nOMOll\blO 3neMeHTOB <name> H <resp> AnR yKa3aHHJI np11pOAbl HHTenneKTyanbHOro BKnaAa, Hanp11Mep,
MbI MO:>KeM BHOCl1Tb CJOAa OTBeTcTBeHHhIX 3a PY"'HYJO pa3MeTKY
AOKyMeHTa.
t.{To KacaeTCJI Jl3blKOBbIX KOprrycoB, MeTa11tt<l>opMaQHJO MO:>KHO o6'be
AHHHTb B o6w;eM 3aronOBO"IHOM Tere Bcero Koprryca J1nl1 A06aBHTb B 3a
ronOBO"IHbie Tern Ka:>KAOfO <l>parMeHTa; o6'beAHHemte 3Tl1X Bap11aHTOB
TaK:>Ke B03MO>KHO.
BttyTpH 3aronoBO"!Horo Tera TEI MO>KHO Hcnonh30BaTb ew;e HeKOTO
pb1e 3neMeHTbl, HO TOnbKO np11 ycnOBHH, "!TO OHl1 3aAaHbl B cxeMe orr11ca
HHJJ. 3TO 3neMeHTbl, KOTOpbre no3BOnJJJOT, HanpHMep, c pa3HblX CTOpOH
oxapaKTepH30BaTb ycnOBHJI, npH KOTOpblX cosepmaeTCJI pe"leBOH aKT,
a TaK:>Ke ero <l>H3H"leCKHe oco6eHHOCTH, a TaK>Ke yqacTHHKOB peqesoro
aKTa. Il0Ao6tta.11 Htt<l>opMaQHH C03AaeTrn cneQHanbHO AnH HY:>KA Kop
rrycttoi1: nHHrBHCTl1Kl1. }J;nR 3Toro BHYTPH Tera <profileDese>, KOTOpbIH
214
LA FILOLÓGICA POR LA CAUSA
pacnonaraeTc.11 BHYTPff 3aronosoqHoro Tera ( <TEIHeader> ), MO)f(HO
ffCnOJib30BaTb ).\OnOJIHffTellhHhie 3JieMeHTbl:
<textDesc> (onffCaHffe TeKCTa) COAep)f(ffT onffcaHffe TeKCTOB Ha
.113bIKe CffTyaQffOHHbIX napaMeTpOB;
<particDesc> (onffcaHffe yqacTHffKOB peqesoro aKTa) - onffchrnaeT
yqacTHffKOB peqesoro aKTa B TeKcTe mo6oro Tffna;
<settingOesc> (onffcaHffe ycnOBffH) - onffChIBaeT ycnosffe ffllff yc
JIOBHJI, npH KOTOpbIX npOHCXO).\ffT peqeBOH aKT (B <f>opMe HeTernposaH
HOfO TeKCTa mm KaK Ha6op TernpoBaHHblX xapaKTepffCTffK).
4. lIHHrBHCTH'leCKaJI pa3MeTKa KOpnyca
.H3blKOBOH Kopnyc qacTO CO}.\ep)f(ffT aHaJiffTffKO-llffHrBffCTff'leCKYIO
pa3MeTKy, ffCnOJib3yeMyIO B pa3m1qHblX llffHfBffCTffqecKJ1X ffCCJieAOBa
Hff.llX. Cyw;ecTByeT HeCKOJihKO MexaHH3MOB, c noMOlil;hIO KOTOph1x MO)f( -
HO npeACTaBffTb npaKTJ1qeCKJ1 JII060H Tffn pa3MeTKff B CTaHAapTHOM
BffAe ffllff no llla6JIOHy, ffHAffBffAyarrhHOMY All.II 3Toro AOKyMeHTa.
fIOA llffHfBffCTffqecKOH pa3MeTKOH noApa3yMeBaeTCJI JII06aJI pa3MeT
Ka, OCHOBaHHa.11 Ha llffHfBffCnfqecKffX xapaKTepffCTffKax TeKCTa .
.[{aHHbie llffHfBffCTffqecKOH pa3MeTKff MO)f(HO A06aBJIJITb K TeKCTO
BbIM 3JieMeHTaM pa3HhIX yposttel1. Hanpm.1ep, KOA Knacca cnos ffllff KOA
qaCTepeqHoH npffHaAJie)f(HOCTJ1 MO)f(eT 6hITb npffB.113aH K Ka)f(AOMY CJIO
ny (TOKetty) ffllff rpynne TOKeHOB, KOTOpa.11 MO)f(eT 6bITb Hepa3pb1BHOH
ffllff pa3pbIBHOH. TaK)f(e COOTBeTCTBYIOW:ffH KOA MO)f(eT 6hITb 3aKpenJieH
3a npeAJIO)f(eHffeM ffllff 3a CJ1HTaKcffqecKffM OTHOllleHffeM.
MexaHff3M pa3MeTKff MO)f(eT 6hITh aBTOMaTttqecKffM, pyqHhIM ffllff
aBTOMaTfflieCKJ1M c pyqHOH npaBKOH. JlerKOCTb ff TOqHOCTb, c KOTopott
MO)f(eT 6bITb aBTOMaTff3ffpOBaHa pa3MeTKa, saphffpyeTC.11 B 3aBffCHMOCTJ1
OT ypOBHJI, Ha KOTOpOM OHa Tpe6yeTCJI. J.1cnOJih3yeMblH cnoco6 pa3MeT
KH MO)f(eT 6hITh yKa3aH B Tere <interpretation> BHYTPff Tera, onttchrna
IOm;ero cnoco6 KOAttpoBKtt, B KOpHeBOM Tere TEIHeader. Ecnff pa3Hhie qacTff Kopnyca HY)f(HO pa3MeTffTh no pa3HhIM napaMe
TpaM, 3TO MO)f(HO yKa3aTb, ffCil01Ih3YJI aTpff6yT decls. J.13-3a 6oJihlllffX ff3Aep)f(eK npff pacno3HaBaHttJ1 TeKcTa ff KOAff POBKe
MHOrHX xapaKTepffCTffK TeKCTa, a TaK)f(e CJIO)f(HOCTeH B o6ecneqeHffl1
eAHH006pa3HblX TeXHOJIOfffH B npttMeHeHHJ1 KO BCeM qacTJIM 60Jibllll1X
Kopnycos, KOAffPOBlil;ffKaM, B03MO)f(HO, 6yAeT YA06Ho pa3AeJiffTh Ha60-
ph1 3JieMeHTOB, KOTOph1e no).\Jie)f(aT KOAffPOBKe, Ha qeTlJipe KaTeropttff:
215
LA FILOLÓGICA POR LA CAUSA
required (o6Jl3aTenbHbie) - xapaKTepHCTHKH, OTHOCJUI~HeC.R K 3TOH
KaTeropHH, 6yAyT KOAHPOBaTbCJI npH aHanH3e K(l)t(AOfO TeKCTa, COCTaB
MIOI.l.\ero Kopnyc;
reccmnended ()l(enaTenbHbie) - xapaKTepHCTHKH, OTHOC.SII.l.\HeCR
K 3TOH KaTeropHH, 6yAyT KOAHposaTbC.R, ecnH 3TO no3BOMIOT coo6pa
)l(eHHR 3KOHOMHH; ecnH 3Ta xapaKTepHCTHKa npHcyTcTByeT B TeKCTe,
HO He 6bma KOAHposaHa, Ha 3TO yKa3b1saeTc11 B KOpHeBOM Tere;
optional <<PaKynhTaTHBHbie) - xapaKTepHCTHKH, OTHOCJII.l.\Hec11
K 3TOH KaTeropHH, MoryT KOAHposaTbCR, a MoryT He KOAHposaTbc11; ecnH
B COOTBeTCTBYfOI.l.\eM Tere He yKa3aHa HH<l>opMaQHJI 06 3TOH xapaKTepH
CTHKe, TO He 3Ha'rnT, 'ITO OHa OTCYTCTByeT B TeKCTe;
proscribed (HCKnIO'leHHbie) xapaKTepHCTHKH, OTHOC1II.l.\Hec11
K 3TOH KaTeropHH, npeAHaMepeHHO He KOAHPYfOTCJI; OHH MOryT 6bITb
npeACTasneHbI KaK Hepa3Me'leHHbIH TeKCT HnH npeACTasneHbI BHYTPH
Tera <gap>, HJIH soo61.I.\e He ynoMRHYTbI, KaK 6hrnaeT 'lall.\e scero.
5. cI>opMaTbl JIJIHrBHCTH'leCKOH paaMeTKH
5.1. <l>opMambt Mop<fionozu<tec1wu pa3MemKu
CneAyeT pa3JIH'laTb <PopMaT CTPYKTYPbI 11;aHHbIX H <PopMaT Harro11He
HHR. C TO'lKH 3peHHR CTPYKTYPbI MO)l(HO BbIIJ;enHTb TpH ocHOBHbIX crro
co6a pa3MeTKH TeKCTa mrnrBHCTH'leCKOH HH<l>opMaQHefl:
• npocTOe 11;06as11eHHe: 3a K(l)t(AbIM CJIOBOM c11e11;yeT KpaTKOe
OTIHCaHHe ero npH3HaKOB, HanpHMep, gives_ vvz, fAe KOA
vvz 03Ha'laeT, 'ITO 3TO TpeTbe JIHQO eA.'l. (Z) 3Ha'IHMOro rna
rona (VV) (I); • Ta6JIHQa: B K(l)t(IJ;OM CTOJI6Qe 3anHCbIBaeTCJI onpeACJICHHblff
Mop<PocHHTaKCH'lCCKHH npH3HaK (II); • 113bIK pa3MeTKH: Ha6op cpeACTB AM 3aTIHCH JIHHfBHCTH'leCKOH
HH<l>opMaQHH, o<PopMneHHbIH B BHIJ;e Jl3bIKa KJIIO'leBblX CJIOB
CO CBOHM CHHTaKCHCOM (III).
The door, which was equipped with neither bell nor knocker, was blistered and distained.
I. I1p11Mep pa3MeTKH Koprryca Associated Press Corpus (ropH30H
TaJibHbIH <PopMaT B KOAHPOBKe yH11sepc11Tern JlaHKacTep ):
[N The_AT door_NNl ,_. [Fr [N which_DDQ NJ [V was_ VBDZ equipped_ VVN [P with_IW [N neither_LE [ bell_NNl nor_ CC
216
LA FILOLÓGICA POR LA CAUSA
knocker_NNI ] NJ PJ VJ Fr] NJ ,_, [V was_ VBDZ [blistered_ VVN and_ CC distained_ VVN] VJ ._.
II. 11pi1Mep pa3MeTKH Kopnyca Associated Press Corpus (seprn:KanhHhIH <PopMaT B Mop<PoCHHTaKCHtleCKOH KOJV1pOBKe ymrnepoueTa JlaHKaCTep):
The AT [N door NNI
' ' which DDQ [Fr[N] was VBDZ [V equipped VVN with IW [P neither LE [N bell NNI [ nor cc knocker NNI ]N]P]V]Fr]N]
' was VBDZ [V blistered VVN and cc distained VVN VJ
EonbllIHHCTBO COBpeMeHHblX .Jl3bIKOB pa3MeTKH OCHOBaHO Ha <PopManH3Me SGML/XML, nocKOnhKY OH o6ecne•rnsaeT B03MO:>KHOCTh .RBHoro AOKyMeHTHposaHH.R ua6opa aTpH6yTOB H pa3Aen.11eT pa3MeTKY CTPYKTYpbr AOKyMeHTa, ero coAep:>KaHH.R H npeACTaBneHH.R nonh30BaTemo. IlpHBeAeM npHMep pa3MeTKH B HOTa1.vm CHHTaKcHca .R3hIKa XML.
III. IlpHMep pa3MeTKH Kopnyca pyccKHX TeKCTOB (.R3hIK pa3MeTKH B Mop<PonornqecKoH KOAHPOBKe HHTepHeT-cepsHca AOT):
<?xrnl version=" 1.0" encoding= "windows-1251" ?><text><p> <s><W>3BOHHllH<ana lemma="3BOHJ1Tb" pos="f" gram="MH,HC,Hn,11cT,npm", /></W>
<W>K<ana lemma="K" pos="flPE.!VI" gram="" /></w> <W>ae11epHe <ana lemma="BE'IEPlUI" pos="C" gram="lKp,e11,11T,np,Ho" /> <ana lemma="BE'IEPHJ1H" pos="II" gram="cp,e11,Kp" /></w> <pun>.</pun></s>
<S><W> ToplKeCTBellJllllit:<ana lemma= "TO P)f(ECTBEHHbIH" pos= "II'' gram= "Mp,e11,HM,BH" /></W>
<W>ryn<ana lemma="fYJI" pos="C" gram="Mp,e11,HM,BH,Ho" /></w> <W>KOllOKOllOB <ana lemma="KOJIOKOJI" pos="C" gram="Mp,MH,p11,tto" /> <ana lemma="KOJIOKOJIOB" pos="C" gram="Mp,<j>aM,e/1,HM,011" /></W> ................. ..... .... <pun>.</pun></s></p></text> .......................... .
217
LA FILOLÓGICA POR LA CAUSA
EAHHHQeH MopcponornqecKoH paaMeTKH BbicTynaeT cnoso (Ter <w>). V1cxOAHaR cpopMa, ynoTpe6neHHaR B TeKcTe, aanHCbIBaeTcR nocne 3Toro Tera. MopcponornqecKHH paa6op cnosa aan11caH B 3neMeHTe <ana>, y KOTOporo ecTb aTp116yTb1:
• lernna - cnosapHrui: cpopMa B sepxHeM perncTpe; • pos 'laCTb pe'IH; • gran Mopcponorw1ecK11e npH3HaKH. Ka)i(,!\Oe cnoao MO)l(eT HMeTb OAHOBpeMeHHO HeCKOJibKO napannenh
HbIX pa36opoB, npeACTaBJieHHblX B nocne,!\OBaTeJibHOCTH 3JieMeHTOB <ana> CTocne paaperneHM HeO,!\H03Ha'IHOCTH (pyqHOrO HJIH aBTOMaTH3HpOBaHHOfO) B BbIXOAHOM npeACTaBJieHHH OCTaeTCR o6bi'IHO TOJibKO OAHH paa6op.
OKOH'laTenhHO cTaHAapTbI HanonHeHHJJ AJIR XML-Kopnycos e1L1e He CJIO)l(JiJIMCb. CpeAH KOHKyp11py10IL1HX Apyr c ApyroM CTaHAapTOB, Ha11-6onee 3Ha'IHMbl cneAY10IL111e: EAGLES (Expert Advisory Group on Language Engineering Standards), TEI (Text Encoding Initiative), 11 XCES (Corpus Encoding Standard for XML).
npaBHJia EAGLES (Recommendations ... - 3JieKTpOHHaR BepcHR) aa,1\alOT 061L1He npHHQHilbl C03AaHJ1R H AOKyMeHTHpOBaHHR KOpnycoB H HX MopcpocHHTaKCM'leCKOH paaMeTKH, a TaK)l(e pRA KOHKpeTHbIX perneHHH AJIR paaMeTKH onpeAeneHHbIX cnyqaes. B qacTHOCTH, OHH peKoMeHAYIOT npoBOAHTb neMMaTH3aQHIO. EAGLES TaJOKe npe,qnaraeT ,qae B03MO)l(HOCTH AJIR xpatteHM Mopcponorn'leCKOH pa3MeTKJ1: Ka)l():\bIH npH3HaK npe,qCTaBJieH OT):\eJibHblM aTp116yTOM (POS='W' nt.Jrber="sing"), HJIH MO)l(HO HCilOJib30BaTb CJIO)l(HYIO Mopcponorn'leCKYIO aHHOTaQHIO, B KOTOpOH 1.111cppbi COOTBeTCTBYIOT npH3HaKaM, HanpHMep, feats="V3011141101200" 03Ha'laeT rnaron, 3rd person, singular, finite, indicative, past tense, active, main verb, non-phrasal, non-reflexive form of a verb (cn11coK peKoMeH,qyeMbIX npH3HaKOB H l1X 3Ha'leHHH npe,!\CTaBJIReT co6ot1 'laCTb peKOMeH,qaQHH EAGLES). 0,qHaKo npas11na EAGLES He co,qep)l(aT roTOnoro tta-6opa Teros AJIR coJ,qaHHR Kopnyca.
CyIL1eCTBY10IL1He Kopnych1, JIHHrBHCTH'leCKaR paaMeTKa KOTOpbIX ocHOBaHa Ha SGML/XML, 11cnonb3YIOT CaMbie pa3Hbie c11cTeMbl KOAHposaHHJI. HanpHMep, BNC 11cnoJih3yeT COIF, ornoBaHHbIH Ha TEI; American National Corpus, Croatian National Corpus 11 AP· Hcnonb3YIOT XCES; ICE (International Corpus of English), Czech National Corpus H Hungarian National Corpus ucnonh3YIOT Ha116onee rnupoKo npHMeHHMbIH cTaH,qapT TEI.
218
LA FILOLÓGICA POR LA CAUSA
,[(JUI pyccKoro .113hIKa cTaHAapT TEI 6brn aAanrn:poBaH C. A. lllapoBhIM H C.O.CaB'IYK (CaB'IYK 2005) H HCilOJib30BaH npH C03AaHJm HaQMOHaJibHOro Kopnyca pyccKoro JI3hIKa.
IlpHBeAeM ABa npHMepa pa3MeTKM B TEI Ha MopcponornqecKOM (1) H CJIOBOo6pa30BaTeJibHOM (2) ypoBHJIX:
(1) I didn't do it <w lenma="i" feats="ppl">I</w> <W> <W lenma="do" featS="wd"xlid</w> <m type="negation''>n't</m> </W> <W lenma="do'' feats="vv0''xlo</w> <w lenma="it" feats="pp3''>it</w>
(2) can[ortable <W type="adjective"> <m type="prefuc" baseform="con''>can</m> <m type="root''>fort</m> <m type="suffix''>able</m> </W>
HaM6onee pa3pa6oTaHttbn1 CTaHAapT AAA co6cTBeHHO JIHHrBMCTMqecKoH pa3MeTKH TeKCTOB - no XCES (Ide, Romary 2002), KOTOpbIH IlJiaHHpyeTCJI npesparn:Tb B Me)l(AyttapOAHbIH CTaHAapT B pycJie npoeKTa ISO TC37/SC4. XCES 3aAaeT a6cTpaKrny10 MemaMOAeJib, KOTOpaR o6ecne'IHBaeT cpeACTBa C03AaHH.ll scex pa3yMHbIX MOAeJieH JIMHfBHCTHqecKMX pa3MeTOK, YAOBJieTBopRIOlQHX npaBHJiaM EAGLES. ,[(AA 3Toro onpeAeJieHbI a6CTpaKTHbie Tern y3nos <struct>" Mx npH3HaKoB <feat>. ,[(n.11 Ka)l(AOro y3na AOJI)l(ett 6hITb 3aAaH ero THn, ttanpuMep, p- level, s-level, w-level, m-level cooTBeTcTBeHHO AAA a63aQeB, npeAJIO)l(eHHH, cnos H MopcpeM. 3To no3BOJUleT npeACTaBJIJITb MYJibTHcnosa KaK OAHY eAMHHQY attaJIH3a, ttanp11Mep, as well as B attrnMHCKOM HJIM rnaroJibI c OTAeJUieMbIMH npHCTaBKaMH, ttanpHMep, zunehmen B HeMeQKOM. Mo:>KHO TaK)l(e npoBOAHTb AeKOMil03MQHIO OAHOfO CJIOBa B npeAenax pa3MeTKH, HanpHMep, AJIJI zum KaK zu dem B HeMeQKOM.
B KaqecTBe OAHoro H3 CTaHAapTOB Mop<l>onornqecKoH pa3MeTKH cneAyeT Ha3BaTb MHOfOJ13blKOBbie MopcpocHHTaKCH'leCKMe cneQH<l>MKaQHH (multilingual morphosyntactic specifications) MULTEXT-East Version 4 (http://nl.ijs.si/ME/V 4/).
219
LA FILOLÓGICA POR LA CAUSA
5.2. Cl>opr.taTl>I KOAff POBaHHJI CKHTaKOf'leCKHX OTHOmeHHH
,[{ocTaTO'IHO w11poK Ha6op Jl3bIKOB AJIJI CHHTaKCH'leCKOH pa3-
MeTKH TeKcTon. Hanp11Mep, JI3bIK ropH30HTaJibHOH 3an11c11 Kopnyca
PennTreebank OCHOBaH Ha xpaHeH1111 AepeBbeB B BHAe LISP-cn11cKoB:
(S(NP-SB (PPH-HD He)) (VP-OC (VVD-HD studied))
(NP-00 (ART-ND the) (NN-HD problen))
B TEI All.II c11HTaKc11qecKHX OTHOWeHHH HMelOTC.11 CTaHAapTHbie Tern:
• Ter <cl> Knay3a, Alli KOA11poBaHHJI cn0>KHOCO'll1HeHHbIX 11 noA
'IHHeHHbIX npeAJIO>KeHHH, y Hero eCTb ABa aTp116yTa: type, 3aAa!O
ll.\HH Cl1HTaKCl1'1eCKHe np113HaKH Knay3bl, H function, 3aAalOII.\11H
<l>YHK[\11!0 Knay3bl;
• Ter <phr> -rpynna cnon, auanorntJHO aTp116yT type 3aAaeT ee T11n
(HMeHHaJI, npeAJIO)f(HaJI H Ap.), H function 3aAaeT ee <l>YHKI..\11!0.
,[{nJI npeACTaBJieHHJI B TepMHHax 3aBHCHMOCTeH MO)f(HO npeAyCMO
TpeTb cneQHaJibHbie Tern, Hanp11Mep, <depe>, KOTOpbIH HMeeT aTp116yTbl
function 11 target, nocneAHHH CCbIJiaeTCJI Ha HAeHTH<l>11KaTOp 3aBl1Cl1-
MOro CJIOBa B npeAJIO)f(eHl111.
Ilp11BeAeM np11Mep Mop<PocHHTaKc11qecKoH pa3MeTKH B TEI npeAJIO-
)f(eH~rn:
Nineteen fifty-four when I was eighteen years old <p> <cl type="finite declarative" fU1ction="independent''> <phr type='W' fU1ction="slbject"> Nineteen fifty-four <cl type=" finite relati vedeclarati ve'' function=" appositive''> l4ien <phr type='W' function="slbject">I</phr> <phr type="W" functioo="predicate''> was eighteen years old</phr> </cl> </phr> </cl>. ..
6. 3aKJIIO'leHHe
B 3aKJI10'leH11e MO)f(HO cKa3aTb, 'ITO KaK n11HrBHCT11tJeCKaJI, TaK 11 3KC
Tpan11HrBHCTH'leCKaJI pa3MeTKl1 AOJl)f(Hbl 6a311poBaTbCH Ha HeKOTOpbIX
AOCTaTO'IHO w11p0Ko pacnpocTpaHeHHbIX 11 06ll.\enp11H.1ITbIX npHHQ11nax
on11caHHJI TeKCTOB 11 H3bIKOBbIX eA11HHQ. 3TH npHHQHilbl Ha116onee rny-
6oKO npopa60TaHbl B Me)f(AyttapOAHbIX CTaHAapTaX, 'laCTb 113 KOTOpbIX
6brna paCCMOTpeHa BbIWe.
EA11Hbie <PopMaTbI npe,LICTaBJieHHJI ,LlaHHbIX no3BO/UllOT BO MHOrHX
220
LA FILOLÓGICA POR LA CAUSA
cnr1amc HCilOnb30BaTb e,J:VtHOe nporpaMMHOe 06ecneqem1e H 06MeHH
BaTbCR KopnycHbIMH AaHHhIMH. MmKHO rosopHTh, c OAHOH cTOpOHhI,
0 CTaHAapTH3aqHH cpopMaTOB npeACTaBneHH.R AaHHblX c TO'!KH 3peHH.R
HX HanonHeHH.R, c Apyrotf, c TO'!KH 3peHH.R 11x CTPYKTYPbI.
IlapaMeTpbI pa3MeTKH Kopnycos 11 HX 3HatieHHR AOn)l(HbI 6hITb AO
CTaTO'!HO «eCTeCTBeHHblMH», T.e. AOn)l(Hbl COOTBeTCTBOBaTb 06ll..\enpH
HRTblM HaytIHblM Knacc11cp11KaqH.RM. lIHHfBHCTH'!eCKOe H nporpaMMHOe
o6ecnetieHHe Kopnyc-MeHeA)l(epos AOn)l(HO nOAAep)l(HBaTb o6pa6oTKY
THilOBblX 3anpocoB H pellleHHe THilOBblX 3aAatI.
IlpHMe'laHHJI
1 TEI P4: An XML Version of TEI Guidelines. http://www.tei-c.org/P4X/ AB.htmlABTEI
2 Guidelines for Electronic Text Encoding and Interchange XML-compatible edition I ed. by C.M.Sperberg-McQueen, Lou Burnad. http://www.tei-c.org/ P4X/index.html
3 TeKylQyIO sep0110 3Toro AOKyMeHTa MO)l(HO Haiirn no aApecy. - http://wwwtei.uic.edu/orgs/tei/intros/teiu5.tei 11n11 ftp://info.ox.ac.uk/pub/ota/TEI/doc/teiu5. tei
JlHTepaTypa
5apaHoB A.H. BBeAeH11e B np11Knai:1Hy10 n11Hrs11cT11Ky. M., 2007 Caa~yK C. 0. MeTaTeKCTOBaR pa3MeTKa B HaQHOHaJibHOM Kopnyce pyccKoro
R3b!Ka: 6a30Bble npHHQHnbI 11 OCHOBHbie <l>YHKQl111 // HaQHOHaJibHb!H Kopnyc pyccKoro R3bIKa: 2003-2005. Pe3yJibTaTbI 11 nepcneKTHBbI. M., 2005. C. 62-88.
Recommendations for the morphosyntactic annotation of corpora, EAG-TC-WG-MAC/R. ftp://ftp.ilc.pi.cm.it/pub/ eagles/ corpora/ annotate. ps.gz
Ide N., Romary L. Standards for Language Resources II Proceedings of Language Resources and Evaluation Conference (LREC02). Las Palmas (Spain), 2002. P. 59-65.
Guidelines for Electronic Text Encoding and Interchange I ed. by C. M. Sperberg-McQueen, L.Burnard. [S.1.], 2001. - http://www.hcu.ox.ac.uk/TEI/P4X/ index.htmlhttp://www.hcu.ox.ac.uk/TEI/P4X/index.html
LA FILOLÓGICA POR LA CAUSA
Hay'-IHOe H3AaHHe
CTPYKTYPHMI VI CTPVIKflAllHMI flVIHfBVICTVIKA
MeJKayJoacKuu c6opHuK
BbinycK 9
PeAaKTOp JI. A. Kapnoaa
KoMnblOTepHa.H sepcTKa E. M. BopoHKOaoii
no,l{mfCaHO B ne'-!aTb 13.07.12. <I>opMaT 60x84 I I 16'
CTe'-!aTb ocpcernaR. fiyMara ocpceTH<UI.
Yrn. ne'-1. n. 20,69. THpa:>K 250 3K3. 3aKa3 2.f.C
l13AaTellbCTBO CaHKT-DeTep6yprcKoro YHHBepcineTa.
199004, C.-0eTep6ypr, B.O., 6-.R !IHHH.R, 11/21.
Ten. (812)328-96-17; cpaKc (812)328-44-22
E-mail: [email protected]
www. uni press. ru
TimorpacpH.R l13,l{aTe!IbCTBa cn6n:
199061, C.-DeTep6ypr, CpeAHHtt np., 41.
LA FILOLÓGICA POR LA CAUSA