МЕЖДУНАРОДНЫЕ СТАНДАРТЫ В ОБЛАСТИ КОРПУСНОЙ...

26
CTpyKTypHaJI l'I n p l'I Kil 3A H aJI 11 "1 H re "1 CT"1 Ka 9 ISSN 0202-2400 • • •• •• •• •••• • • • • • •••••• • ••••• ••••••••••• LA FILOLÓGICA POR LA CAUSA

Transcript of МЕЖДУНАРОДНЫЕ СТАНДАРТЫ В ОБЛАСТИ КОРПУСНОЙ...

CTpyKTypHaJI l'I n p l'I Kil 3A H aJI 11 "1 H re "1 CT"1 Ka

9

ISSN 0202-2400

• • • • • •• •• •• •••• • • • • • • • • • • • • •••••• • ••••• •••••••••••

LA FILOLÓGICA POR LA CAUSA

CAHKT-TIETEPBYPfCKJ1J1 fOCY}lAPCTBEHHhIJ1 YHJ1BEPCJ1TET

CTPYKTYPHAH 11 I1Pl1KJIA,[J;HAH Jil1HfBl1CTl1KA

Me:HCBY308CKUU c6opHuK

BhrrrycK 9

LA FILOLÓGICA POR LA CAUSA

Y,UK 80+618.31 BBK 81.1

C83

Pe A a K l..l w o H Ha H Ko JI JI er w H: npoc}>. n. H. EellJleea, npocp. A. C. fepo (oTB. pe­

AaKTop ), npocp. 0. H. fpuH6ayM, npocp. M.A. Ma­pyceHKO

C e K p e T a p b peAaKQHOHHot1: KOJIJiernn B. J!f. Py6uHep

p e l..l e H 3 e H T KaHA. cpHJIOJI. Hayi< AOQ. J!f. II. llaHKOO

lle11amaemcH no nocmaHoeneHu10 PeoaK11uoHHo-u3oamenbcKOzo coeema

<fiunonozuttecKozo <fiaKynbmema C.-llemep6ypzcKozo zocyoapcmeeHHozo yHueepcumema

CTp'fKTYPHaH H npHKJiap;HaH JIHHrBHCTHKa. Bbm. 9: Me)l(­

C83 By3. c6. I no.a; pe,a;. A. C. fep,a;a. - CI16.: l13A-BO C.-I1eTep6. ytt-Ta, 2012. - 356 c.

C6opttHK (Bbm. 8 BbIUieJI B 2010 r.) coAep)f(HT cTaTbH no UIHpOKOMY

Kpyry npo6JieM TeopenrtteCKOH 11 npHKJiaAHOH JIHHfBHCTHKH, no npHMe­

HeHHIO MaTeManttteCKHX MeTOAOB B Jl3bIK03HaHHH.

,D;JIR cneQHaJIHCTOB no TeOpJrn: Jl3bIKa, npHKJiaAHOH H TeopenPieCKOH

JIHHfBHCTHKe.

66K81.l

© C.-IleTep6yprCKJ1H

rocyAapcTBeHHbIH,

yttHBepcHTeT,2012

LA FILOLÓGICA POR LA CAUSA

B. fl. 3axapoa

ME)l{,!J;YHAPO,!J;HbIE CTAH,!J;APTbl B OBJIACTJ1 KOPIIYCHOM JIJ1HfBJ1CTJ1KJ1

AHHOmatjUJl. B CTaTbe o6cylKJtalOTCll BOnpOCbl pa3pa60TKl1 CTaHttapTOB B o6naCTl1 Kopnycttoi1 mrnrs11cT11K11. ,!l.aeTc11 nottpo6HbIH attan113 peKOMCHJtal.(111'.i npoeKTa TEI (Text Encoding Initiative). PaccMaTpusa10TC11cpettCTBa11 cnoco6b1 3attaHJ.111 MeTattaHHhIX 11 nHHrBHCTH'!eCKOH pa3MeTKl1.

K111011ea1>1e CllOBa: Kopnyctta11 n11ttrn11crnKa, 113b!KOBOH Kopnyc, CTaHttapTb1, pa3MeTKa, Text Encoding Initiative.

V. P. Zakharov

INTERNATIONAL STANDARDS IN CORPUS LINGUISTICS

Summary. The paper deals with issues of corpus linguistics standards. The Text Encoding Initiative (TEI) project recommendations are carefully analysed. Tools and methods of adding contextual information (metadata) to a corpus and linguistic annotation are discussed.

Keywords: corpus linguistics, language corpora, standards, tagging, annotation, Text Encoding Initiative.

l. BcTynrrenHe

KopnycmUJ mrnrsHCTHKa - CJIO)f(Haf! JIHHrBHCTH'leCKaR AHCJ..\HITllHHa,

KOTOpaF! ccpopMHpoBaJiaCb B IJOCJieAHHe AeCRTHJieTJrn Ha 6a3e 3lleKTpOH­

HOH Bhl'rncnHTeJihHOH TeXHHKH. 0Ha H3y'laeT nocTpoeHHe JIHHrBHCTH­

qecKHX KopnycoB, cnoco6hI o6pa60TKH AaHHbIX B HHX H co6cTBeHHO

MeTOAOllOfHIO HX C03AaHHR H HCITOJib30BaHHR. MO)f(HO CKa3aTb, 'ITO BCe

COBpeMeHHhie mrnrBHCTH'leCKHe HCClleAOBaHHR H pa60Tbl no COCTaBJie­

HHIO CllOBapeM: H rpaMMaTHK TaK HllH HHa'!e opHeHTHpOBaHhI Ha HCilOJib-

30BaHHe npeAcTaBHTellbHhIX KopnycoB TeKCTOB. Pa3BHTHe coBpeMeH­

HhIX HHTeJineKTyaJibHblX nporpaMMHbIX CHCTeM, npeAHa3Ha'leHHbIX AJIR

© B. Il.3axapon,2012

201

LA FILOLÓGICA POR LA CAUSA

o6pa6oTKl1 TeKCTOB Ha ecTecTBeHHOM JJ3bIKe, TaIOKe Tpe6yeT 60JibllIOH

3Kcnepl1MeHTaJihHOH JIHHrBl1CTl1'l.eCKOH 6a3bI. Cnpoc Ha KopnycHbie ,n;aH­

Hhre COBnaJI c nOJJBJieHl1eM COOTBeTCTBYIOI.I.\HX TeXHH'l.eCKHX B03MO)f(H0-

CTeH.

Kopnychr, KaK npaBl11IO, npe,n;Ha3Ha'IeHbI ,n;AA Heo,n;HoKpaTHoro np11-

MeHeHJ1J1 MHOfl1Ml1 n01Ib30BaTelIJJMJ1, n03TOMY J1X pa3MeTKa J1 J1X mrnr­

BJ1CTl1'IeCKOe 06ecne'l.eH11e .n;orr)f(HbI 6hITb onpe,n;eJieHHhIM o6pa30M YHl1-

cpl1Ql1pOBaHbI. CrnH,n;apTbI B OTHOIIIemm KopnycoB o6bI'l.HO 3aTparnBaIOT

COBMeCTl1MOCTb Tl1nOB pa3MeTKH. J1x Ha3bIBaIOT HHOr,n;a «CmaHoapmaMU KooupoBaHUH». TaK)f(e Ba)f(eH Bonpoc, CBJJ3aHHhIH co cpaBHHMOCThIO

pa3HbIX KopnycoB, B TOM 'l.l1C1Ie c OQeHKaMH no noBo,n;y J1X np11ro,n;HOCTl1

K pa3JIJ1'IHbIM 3a,n;aHJ1J1M. J1x Ha3bIBaIOT «cmaHoapmaMu Ol.{eHKU». Ha116011hIIIy10 ClIO)f(HOCTb npe,n;CTaB1IJ1eT CTaH,n;apTH3aQl1R TpaHcKp11-

611poBaHJ1J1 ycTHOH peq11 11 11crnp11qecKJ1X KopnycoB. Ecnl1 B o6rracTl1

rpacpH'IeCKOH cp11KcaQ1111 ycTHOH pe'l.11 ,n;a)f(e np11 OTCYTCTBHl1 e,n;11Horo

J1 o6JJ3aTe1IbHOfO AlIR BCex CTaH,n;apTa ,n;OCTHrHYT HeKOTOpbIH nporpecc

(CBJJ3aHHbIH npe)f(.n;e Bcero c HalIH'l.HeM npeQe,n;eHTOB), TO B on11caHHl1

HeBep6aJibHOH COCTaBlIRIOI.I.\eH ecTeCTBeHHOR3bIKOBOH KOMMYHHKaQHH

CTaH,n;apTbI .n;o CHX nop He Bbipa6oTaHbI, 'ITO 3aTpy,n;HReT ,n;allhHeHIIIee

npo,n;BH)f(eHHe B 3T_OH 0611acrn (EapaHOB 2007). CTaH,n;apTH3aQJ1JJ B OTHOIIIeHHl1 KopnycoB, COBMeCTHMOCTb THnOB

,n;aHHbIX Ba)f(Hbl H c TO'l.KH 3peHHJJ cpaBHHMOCTH pa3HbIX KopnycoB.

Ilpw!eM KOpnyCbI MOryT no,n;BepraTbCJJ KaK KOlIH'l.eCTBeHHOH, TaK l1 Ka­

'IeCTBeHHOH OQeHKe. KoJIH'IecTBeHHhie ,n;aHHhie o Kopnycax no3B01IJJIOT

cy,n;HTb 06 l1X o6'beMe, o HanorrHeHl1H Kopnyca no pa31Il1'l.HbIM Kp11Tep11-

JJM, o Jil1HrBOCTaTl1CTH'IeCKHX napaMeTpax Kopnyca l11IH no,n;KopnycoB.

flo,n; Ka'l.eCTBeHHOH OQeHKOH nOHHMaeTCJJ OQeHKa l1 cpaBHeHHe KOpny­

COB Ha OCHOBC aHalIH3a BbI,n;aBaeMbIX pe3yJibTaTOB.

Bonpocbr npl1ro,n;HOCTH KopnycoB K pa3Jil1tJHbIM 1111HrBl1CTH'l.eCKHM

3a,n;aHJ1JJM TaIOKe Tpe6y10T CBOHX «CTaH,n;apTOB OQeHKH».

2. 063op Me)l(/:~yttapo,n;HblX CTan.n;apTOB 1<0pnycu01'i 11HHrBHCTHKH

B HaCTOJJI.I.\ee BpeMR Ha OCHOBe Me)f(,n;yHapo,n;Horo OilbITa Bbipa6oTa­

lIJ1Cb .n;e-cpaKTO CTaH,n;apTbI npe,n;CTaBJieHJ1JJ MeTa,n;aHHbIX, KaK Jll1HfBl1CTH­

lJ.eCKJ1X, TaK J1 3KCTpanHHfBJ1CTH'l.eCKJ1X, 6a3Hp)'IOI.I.\HeCJI Ha om1CaHJ1J1X

TeKCTOB 11 KopnycoB B pycrre npoeKTOB Text Encoding Initiative (TEI),

ISLE Project (International Standards for Language Engineering) l1 Ha pe­

KOMett.n;aQ11JJx EAGLES (Expert Advisory Group on Language Engineering

202

LA FILOLÓGICA POR LA CAUSA

Standards). CpeAH HHX B nepBy10 oqepeAb cneAyeT HaJBaTb CDIF (Cor­pus Document Interchange Format, www.natcorp.ox.ac.uk/archive/vault/ tgcw30.pdf), CES (Corpus Encoding Standard, http://www.cs.vassar.edu/ CES/CESl.htmlContents), XCES (Corpus Encoding Standard for XML, http://www.xces.org/).

3Tl1 l1 Apyrne CTaHAapTbl B HaCTO.Rll.lee BpeWJ «C06Hpa!OTCJI» H 0606-ll.la!OTCJI noA 3rHAOH KOMHTeTa Me)l(AyttapOAHOH opramrnaQHH no CTaH­AapTH3al.\HH ISO/TC 37 Mttorne l13 HHX HanpRMyio OTHOCRTCR K Kop­nycttotf nHHfBHCTHKe, KaK-To: ISO 24614-1:2010. Tioc110BHaR cerMeH­Tal.\HR nHCbMeHHbIX TeKCTOB. qacTb 1. OcttoBHbie KOHQenQHH l1 06U1He npHHQHnbI, ISO 24610-1:2006. CTpyKTypb1 311eMeHTOB. qaCTb 1. TipeA­CTaBnettHe CTPYKTYPbI 311eMeHTOB AaHHbIX, ISO 24610-2:2011. CTpyKTY­pbI 3JieMeHTOB. qaCTb 2. On11caHHe CHCTeMbI 3neMeHTOB AaHHbIX, ISO/ DIS 24611. Mopcl>ocHHTaKCH'leCKM pa3MeTKa, ISO 24613:2008. CxeMa 11eKCH'leCKOH pa3MeTKl1, ISO 24615:2010. CHcTeMa Cl1HTaKCl1'iecKoro aHHOTHpoBaHHR (SynAF) 11 AP· 3TH CTaHAapTbI noA 06U1HM Ha3BaHHeM « YnpaB11ett11e nHHfBHCTH'leCKHMH pecypcaMH» onHCb1Ba10T:

• npHHQHnbI, MeTO,!\bl CTaH,!\apTH3aQHH TepMl1H0110fl1H; • pa3pa60TKY TepMl1H0110fl1t{eCKHX CTaHAapTOB; • TepMHH0110fl1'1eCKHe cnoBapH; • C03,!\aHHe Jl3bIKOBblX pecypcoB; • KOMnblOTepttyio 11eKCHKOrpacl>H10; • TepMHHOnornqecKy10 AOKyMeHTaQl110; • KOAHpoBaHHe B 0611acT11 TepM11Honorn11 l1 1111HrBHCTH'leCKHX pe­

cypcoB; • HCnOllb30BaHHe TepMHH0110rHl1 J1 ApyrHx Jl3bIKOBbIX pecypcoB

B Jl3bIKOBOH HH)l(ettepHH H ynpaBneHl1l1 KOHTeHTOM.

3. CTanp;apTbI npoeKTa Text Encoding Initiative

HaH6011ee npopa6oTaHbI peKOMeHAal.\HH npoeKTa Text Encoding Initiative (TEI). Haqa110 npoeKTa no C03AaHHIO cHcTeMbI KOAHpoBaHHR TeKCTOB CB.R3aHO c ceMHHapoM B BaccapcKOM Konne,l\)l(e B 1987 r., Ha KO­TOpOM npHCYTCTBOBalIH npeACTaBHTelIH TeKCTOBblX apXHBOB, Hay'iHbIX 06U1eCTB l1 HCClle,!\OBaTellbCKKX l.\eHTpOB. L(ellblO BCTpeq11 6blll0 o6cy)l(­AeHHe B03MO)l(HOCTH C03AaHHR CTaHAapTttotf cxeMbI KOAHpoBaHM TeK­CTOBbIX AOKyMeHTOB. B 1988 r. TEI crnpTOBan co6cTBeHHO KaK npoeKT.

C11cTeMa TEI AaeT peKOMeHAaQHH no 3lleKTpOHHOH ny611HKaQHl1 TeK­CTOB (HAeJ-ITH<l>HK~HR TeKCTa, npeACTaBneHHe, aHalll13 l1 HHTepnpeTaQHJI,

203

LA FILOLÓGICA POR LA CAUSA

MeTCUI3bIK orrncaHIDI 11 KOA11POBK11). Otta, B OCHOBHOM, paCC'll1TaHa Ha

TeKCTOBbie AOKyMeHTbI, HO TaIOKe npeAOCTaBJUleT B03MO)f(HOCTb on11ca­

Hl1R: 11 11AeHrn<i>11Ka1.11111 AaHHbIX Apyr11X <PopMaToB, ttanp11Mep, rpa<P11K11

11 3BYKOBblX MaTep11a110B. fnaBHaJI 1.1e11b npoeKTa - pa3pa6oTKa <PopMaTOB

AJUI o6MeHa AaHHbIMl1 B ryMaH11Tapttot1 0611acT11.

PeK0MeHAa1.11111 TEI np113BaHbI:

1) orrpeAelll1Tb eA11HbIH Cl1HTaKCl1C <PopMaTa;

2) onpeAe1111Tb MeTCUI3bIK AJUJ on11caHIDI cxeM npeACTaBJieHIDI 11 KO­

Al1POBaHIDI AaHHbIX;

3) on11caTb cyll.leCTBYIOll.111e cxeMbl KOA11pOBaHIDI Ha MeTCUl3bIKe npo­

eKTa;

4) npeAJIO)f(llJTb MHO)f(eCTBO cxeM OIIl1CaHIDI AJUI pa3Hb!X AaHHbIX

11 pa3HhIX 3aAa'I;

5) o6ecrre'll1Tb MaKC11MallbHYIO COBMeCTl1MOCTb c cyll.leCTBYIOll.111Ml1

CTaHAapTaM11;

6) IIOMep)f(11BaTb KOHBepc1110 cxeM KOA11POBaHl1R: cyll.leCTBYIOlllHX

Mallll1HO'll1TaeMhlX TeKCTOB B Cl1HTaKCl1C HOBOro <PopMaTa 6e3 A0-

6aBJieHl1R KaKOH-111160 HOBOH 11H<PopMa1.11111 B 3Tl1 TeKCThI;

7) 06ecrre'l11BaTh B03MO)f(HOCTh 11cno11b30BaH11R AaHHhIX B <PopMaTe

TEI 6e3 crre1.111a11hHOro nporpaMMHoro 06ecne'leH11R.

OcttOBHbie noltRT1111 11 cTpyKTypa TEI npaKT11'1ecK11 He 113MeHR1111cb

Ha rrporn)f(eH1111 np11MepHo AeCRrn neT. TpeTbR BeprnR TEI (TEI P3)

6hrna ony61111KOBaHa B 1994 r., AOilOJIHeHa B 1999 r. Y.eTBepTCUI Bepc11R

TEI (TEI P4), C03AaHHCUI B 2001 r., npeACTaBJUJJia co6ot1 He6011h1IIOe AO­

no11HeH11e, CBR3aHHoe c BHeApeH11eM R3hIKa XML (Guidelines ... -3JieK­

TpoHHCUI Bepc11R). TiocneAHRR BeprnR: peK0MeHAa1.111M: (TEI PS) 6brna

ony6n11KOBaHa B 2005 r . .5brn11 A06aB11eHbI tta6opb1 6a30BhIX TeroB AJUJ

HOBbIX THIIOB AOKyMeHTOB, cpeACTBa on11caHID1 <P113H'leCKOfO COCTORHl1R:

AOKyMeHTa (B 'laCTHOCTl1, pyKon11cet1:) 11, '!TO cyll.leCTBeHHO AllR 1111ttrn11-

CTOB, yKa3aHIDI no npeACTaBJieHl1!0 1111HfBl1CTl1'1eCKOfO on11caH11R AM

KoprrycoB TeKCTOB. ,D,a11bHeH1IICUI pa6orn HaA peKOMeHAaQIDIMl1 TEI npo­

AOJI)f(aeTcR:. TI11aH11pyeTcR yAe1111Tb 6011b1IIe BH11MaH11R TaK11M cTOpoHaM

npeACTaBJTel-11111: TeKCTa, KaK rpaMMaT11'1eCKCUI pa3MeTKa, l1CTOp11'1eCKOe

on11catt11e, on11catt11e <P11311qecKoro cocTORHl1R: AOKyMeHTa, a TaIOKe npo­

AOJI)f(l1Th pa3pa6oTKy 6a30Bb!X Ha6opoB TeroB AJUJ pa3JTJ1'1Hb!X R3b!KOB

11 Tl1nOB AOKyMeHTOB.

TEI nOAAep)f(11Ba10T TaKHe Me)f(AyHapOAHb1e opratt113a1.11111, KaK

Association for Computers and the Humanities (Acco1.111a1.1ID1 no KOMilhlO-

204

LA FILOLÓGICA POR LA CAUSA

TepaM w ryMaimTapHbIM HayKaM), Association for Computational Lin­guistics (Accou;wau;wH no Bwmc1rnTenhHOH nwHrBHCTHKe) w Association for Literary and Linguistic Computing (Accou;11au;1rn no KOMnhIOTepHbIM TeXHOnomRM B nHTepaType H nHHfBHCTHKe).

Ilpou;e.z:1ypy onwcamui: TeKCTOB Ha3bIBaIOT pa3MeTKOH HnH KO.z:IHPO­BaHHeM. JI106oe npe.z:1crnBneHwe TeKcTa Ha KOMnhIOTepe wcnonh3yeT TY wnw HHYIO <PopMy pa3MeTKH; O.z:IHOH H3 npH'IHH pa3pa60TKH rncTeMbI TEI 6brno cyw;ecTBoBaHwe orpoMHoro KonwqecTBa B3aHMHO HeCOBMe­CTHMbIX CHCTeM KO,[IHpOBaHHJI, a TaIOKe ysenwtJeHHe o6nacTeH wcnonb-30Bamrn 3neKTpOHHbIX TeKCTOB .

.UM onpe.r1eneHHH cxeM&I KO.z:IHpoBaHw.11 wcrronh3YIOTCR R3bIKH SGML w XML, II03BOAAIOw;we <PopManhHO onpe.z:1enHTh cxeMy KO.z:IHpoBaHHR B TepMHHax 3neMeHTOB H aTpw6yTOB, a TaIOKe c IJOMOlll;blO npaBHn, yrrpaBAAIOW:HX HX pa3Mew;eHHeM B TeKCTe.

3.1. CTpyKTypa TeKcTa Text Encoding Initiative

Bee MeTKH TEI rrpwMeHHTenhHO K KopnycaM MO)f(HO OTHeCTH K pa3-nH'IHhIM rpyrrnaM, B qacTHOCTH: Mern.r1aHHb1e, CTPYKTYPH&Ie 3neMeHThI TeKCTa, crreu;wanhHaR (nHHfBHCTH'leCKaR) MeTawH<PopMau;w.11.

,UoKyMeHT B <PopMaTe TEI HMeeT .z:IBe OCHOBHbie qacTw: 3aronoBOK (3neMeHT <teiHeader>) H co6cTBeHHO TeKCT (<text>). 3aronOBOK - 3TO <PaKTHtJeCKH 3neKTpOHHaR BepcwH THTyn&Horo nwcTa. OH MO)l(eT co.z:1ep­)l(aTh TaKYIO 1rn<PopMau;w10, KaK 6w6nworpa<PwqecKwe .z:1aHH&1e HCTO'IHH -Ka, CBe,[leHH.H o KO.z:IHPOBKe, He6w6nworpa<PwqecKoe orrwcaHwe w )l(ypHan 11cnpaBneHHH.

TeKcT B <PopMaTe TEI MO)l(eT 6hITh MOHonHTHbIM (oT.z:1enhHOe npoH3Be­.z:ICHHe) HnH ofr&e,[IHHeHHbIM (c6opHHK). J1 B TOM, H B .z:1pyroM cnyqae TeKCT MO)l(eT HMeT& BBO.z:IHYIO qacT&, ocHOBHYIO w 3aKnIO'IHTen&HyIO. B cnyqae o6'be,[IHHCHHOfO TCKCTa OCHOBHaR 'laCTb MO)l(eT COCTO.HTb '13 rpyrrrr, Ka)l(­,[laR '13 KOTOpbIX, B CBOIO oqepe,[lb, MO)l(eT co.r1ep)l(aTb rpynrr&I HnH TeKCTbl.

OcHOBHaR qacT& .z:1enwTc.11 Ha a63au;b1 ( <p> ), pa3.z:1en&1 (<div>) w rro.z:1-pa3.z:1en&1 ( <divn>, r.z:1e n o6o3HatJaeT ypoBeHb rro.z:1pa3.z:1ena w MO)l(eT 6&ITb OT 1 110 7). ,Upyrne 3neMeHTbI CTPYKTyp&I TCKCTa B TEI 3TO 3aronoBKH, rrpwMetJaHH.H, HOMepa CTpOK H CTpaHHU: H T. A·

KpoMe Toro, TEI .r1aeT B03MO)l(HOCTh BbI,[leneHHH oT.r1en&HbIX 3neMeH­TOB TeKCTa c yKa3aHHCM npH'IHHbl BbI,[leneHH.H:

<emph> <Ppa3a, BbIAeneHHaR c u;en&IO nonyqeHHR nHHrBHCTH'le-CKoro HnH pinopwqecKoro 3<P<l>eKTa;

205

LA FILOLÓGICA POR LA CAUSA

<foreign> - CJIOBO lillllil cppa3a Ha HHOCTpaHHOM Jl3hU<e (He Ha TOM, Ha

KOTOpOM HanHCaH OCHOBHOH TeKCT);

<term> - CJIOBO HJIH cppaaa, paccMaTpHBaeMble B TeKCTe KaK TeXHH­

qecKlilH TepMHH;

<title> - Ha3BaHHe npOH3BeAeHHJI (KHHrn, CTaThH, )l(ypHaJia HT. n.).

B TEI COAep)l(HTCJI noApo6Hei'1wan paapa6oTKa paaMeTKH CaMhIX

paanHqHhrx TeKCTOB mm HX cocTaBHhIX qacTei'1. B qacTHOCTH, 3TO no-

3THqecKHe, cQeH11qecKHe TeKCThI. ycTHan peqh, cnosap11, pyKonttcH,

cpaKTorpacpttqecKHe 6aab1 AaHHhIX (HMeHa, AaThI, JIHQa, reorpacp111qecKHe

Ha3BaHHJI H T. n.), Ta6JIHQhl, cpopMyJihI H rpacpl!IKH, rpacpbl, cxeMbl, Ae­

peBhSI H AP· 0TAeJibHO o6cy)l(AaIOTCJI sonpochI KOAHPOBKH AaHHhIX AJIJI

pa3HhIX Jl3hlKOB.

3.2. PeKoMeHA3J4HH no C03A3HHIO .R3bIKOBbIX Kopnycos

Oco6o CJieAyeT ocTaHOBl!!ThCJI Ha paaMeTKe JI3hIKOBhIX Kopnycos.

B TEI JI3hIKOBhIMl!I Kopnyca.MH Ha3hIBaIOTCJI cocmaBHble Kopnycbl, m. e. eOUHble i<ellbHocmu, cocmoH~ue U3 MHOJICecmBa meKcmoB. 3To 06'b11c­

HJ1eTrn TeM, qTQ, xorn K~blH OTAeJibHhlH cpparMeHT TeKCTa B Kopnyce

HMeeT npaso cqHTaTbCJI caMOCTOR:TeJibHhlM TeKCTOM, B HayqHhlX QeJIJIX

K3)1(Ahlff cpparMeHT paccMaTpl!!BaeTCJI KaK COCTaBJIJllOIQaJI 60Jibllle­

ro o6'beKTa. Kopnycht H Apyrne THilhI cocTaBHhIX TeKCTOB (HanpwMep,

aHTOJIOrnH J1 c6opHHKH) li!MeIOT MHOfO o6w.ero. TipHMeqaTeJibHO, qTQ

pa3Hble KOMilOHeHThl COCTaBHhIX TeKCTOB MOryT HMeTb pa3Hhle CTpyK­

TYPHhle xapaKrepHCTHKH (Hanp111Mep, AOnycKaeTca o6'beAHHettwe B Kop­

nyce CTHXOB H npoaa11qecKHX TeKCTOB), nplil 3TOM pa3Hble KOMilOHeHTbl

o6cny)l(HBaIOTCR: 3JieMeHTaMl!I pa3HhIX MOAynei'1 TEI. TioMHMO ocHOBHhIX Teros TEI npeAnaraerca pRA cneQHaJIH3HposaH­

HhIX Ha6opoB TeroB AJIJI pa60TbI c KOpnycaMlil.

PaCCMOTplilM OCHOBHble Tern Iii B03MO)l(HOCTlil CTaHAapTa c TO'IKlil ape­

HHJI MHOroo6pa3lilJI THilOB KOpnycoB J1 pewaeMhIX B KOpnycttm1: JIJ1HfBlil­

CTlilKe 3aAa'I.

)J;JIJI opraHH3a~lill1 OCHOBHhlX ypoBHeH KOpnycon npeAHa3HaqeHbl

CJieAyIOlll,He Tern:

<teiCorpus> COAep)l(HT BeCb KOpnyc, 3aKOAHPOBaHHbIH D cpopMa­

Te TEI; KOpnyc COCTOHT 113 aaroJIOBO'IHOro Tera Kopnyca J1 OAHOro J1JIH

HeCKOJibKHX Teros TEI, K3)1(AhIH lil3 KOTOphrx COAep)l(HT aaronoBO'IHhIH

Ter TeKCTa Iii caM TeKCT;

206

LA FILOLÓGICA POR LA CAUSA

<TEI> (,n:oKyMeHT TEI) - co.z:1ep)Kl1T o,n:11H ,n:oKyMeHT, coBMeCTl1MbIH

c <l>opMaTOM TEI; 3TOT ,n:oKyMeHT cocT011T 113 3arorroBoqHoro Tera TEI 11 TeKCTa;

<teiHeader> (3arorroBoqHb1J1 Ter) - co,n:ep)l(11T ormcaH11e TeKcTa

11 11H<l>opMau;mo 0 ero ,n:eKrrapau;1111 B Bl1,[\e 3JieKTpOHHOH CTpaHl1ll;bI, KO­

Topaa pacnorraraeTc.11 nepe.n: HaqarroM Ka)l(JJ:Oro TeKcTa, coBMecT11Moro

c <PopMaTOM TEI; ~ - yKa3bIBaeT Ha Tl1Il ,n:oKyMeHTa, K KOTopoMy OTHOCl1TC.ll 3TOT

3aroJIOBOqHbIH Ter (He3aBl1Cl1MO OT Toro, ,n:oKyMeHT - 3TO Kopnyc 11Jil1

OT,U:eJibHbIH TeKCT);

<text> co,n:ep)l(11T OJJ:l1H TeKCT mo6oro T11na, u;errbHbIH 11Jil1 co-

CTaBHOH, Hanp11Mep, Il03MY 11Jil1 nbecy, ll;l1KJI 3CCe, poMaH, CJIOBapb 11Jil1

<l>parMeHT Kopnyca;

<group> - co,n:ep)l(11T COCTaBHOH TeKCT, KOTOpbn1: COCTOl1T 113 pa3-

n11qHbIX TeKCTOB (rpynn TeKCTOB), KOTOpbie no KaKOH-TO np11q11He pac­

CMaTp11BaIOTC.ll KaK e,n:11ttoe u;erroe, Hanp11Mep, TeKCTbI OJJ:HOro aBTopa,

CTHXOTBOpHbIH ll;l1KJI 11 T. JI:. Ter <teiCorpus> npe,n:Ha3HaqeH ,n:rr.11 Ko,n:11poBKl1 o6'heMHbIX Kopny­

COB, HO MO)l(eT OKa3aTbC.ll norre3HbIM 11 np11 KO,U:11pOBKe ra3eT, KOMilbIO­

Tep11311poBaHHbIX aHTorrornJ'.1 11 npoq.KX 06'be,n:11tteHHbIX TeKCTOB. OT­

.z:1errhHh1e qaCTl1 Kopnyca KOJJ:l1PYIOTC.ll OT,U:eJibHbIM11 TeraMl1 <TEI>, a Bech

Kopnyc 3aKrrJOqeH B Ter <teiCorpus>. Ka)l(JJ:M qacTb Kopnyca 11MeeT

cTaH,n:apTHYIO CTPYKTYPY ,n:oKyMeHTa, 3aKJIJOqeHHoro B Ter <TEI>: 3aro­

rroBoqHbIH Ter <teiHeader> 11 crre.z:1yI0iu;11J1 3a Hl1M Ter <text>. CaM Kop­

nyc TaK)l(e 3aKJIJOqeH B Ter <teiHeader>, B KOTOpOM MO)l(eT 6hITb on11caH

11 caM Kopnyc, 11 cnoco6 Ko,n:11poBK11 pa3HbIX •1aCTeJ1 Kopnyca.

J1H<l>opMau;.11.11, KOTopaa pacnorro>KeHa BHYTPl1 3arorroBoqHoro Tera

11 OTHOCl1TC.ll KO BCeMy Kopnycy, a He K ero OT,U:eJibHbIM KOMilOHeHTaM,

,n:orr>KHa co,n:ep)l(aTbC.11 BHYTPl1 Tera <teiHeacler>, nepe.n: BHyTpeHHl1M11

TeKcTaM11 Kopnyca. TaKaa ,n:ByxypoBHeBaa CTPYKTypa no3Borr.11eT ,n:o-

6aB11Tb MeTa11H<l>opMau;11IO Ha ypoBHe Kopnyca, Ha ypoBHe OT,[\ellbHOro

TeKCTa HJil1 Ha 06011x ypoBH.llX cpa3y.

TeKcm TpaKTyeTc.11 KaK rrI06oe peqeBoe npo113Be,[lett11e, 3aKoHqeHHOe

11Jil1 He3aKOHqeHHOe, u;errbHOe 11Jil1 COCTaBHOe, KOTOpoe paccMaTp11BaeT­

C.ll KaK e.z:111ttoe u;erroe. TepM11H «COCTaBHOH TeKCT» TpaKTyeTc.11 KaK TeKCT,

BHYTPH KOToporo co.z:1ep)l(aTc.11 ,n:pyrne TeKCTbI.

nepeq11crreHHbie BbilUe Tern MO)l(HO KOM611H11pOBaTb All.II KOAHPOBKH

BCeB03MO)l(HbIX COCTaBHbIX KOpnycoB pa3HbIMl1 cnoco6aMl1.

207

LA FILOLÓGICA POR LA CAUSA

KoMnotteHTbI KopnycoB - caMOCTORTenbHbie TeKCTbI, OAHaKo 3a'la­

CTYIO YA06Ho C'IHTaTb Kopnyc eAHHbIM QenbIM: no ynpon_\aeT cpopMH­

poBaHHe Kopnyca H pa3MeTKy. TaKHM o6pa30M, MO)l(HO paccMaTpHBaTb

KOpnyc KaK oco6bIH ceMMOTH'leCKHH o6'beKT co CBOHMH ceMaHTHKOH,

CHHTaKCHCOM H nparMaTHKOH.

B HeK0Topb1x cny'laRX, 3aMb1cen Kopnyca HaxOAHT oTpa)l(eHHe B ero

BHYTpettttel1 cTpyKType. HanpHMep, Kopnyc oTpbIBKOB ra3eT MO)[(eT

6bITb opraHH30BaH TaK, 'ITO OTpbIBKH CTaTeH crpynnHpOBaHbl no THnaM

(penopTa)l(H, peAaKrnpcKHe CTaTbH, o63opbI HT. A.), a BHYTPH Ka)l(AOro

THna npHcyTcTByeT AOnonHHTenbHaR KnaccHcpHKaQHR no AaTe, MecTy

ny6nHKaQHH H T. A·

EcnH HY)l(HO noKa3aTb, 'ITO Kopnyc cocTOHT H3 pRAa noAKopnycoB,

TO CaM Kopnyc HnH nOAKOpnyc 6onee BbICOKOfO ypOBHR MO)l(HO npeACTa­

BHTb KaK COCTaBHOH TeKCT c nOMOll.\blO Tera <group> AAA o6'beAHHeHHbIX

TeKCTOB. 06'be,D.HHHTb KOMnOHeHTbl MO)l(HO TaIOKe c noMOll.\blO TeroB

AAA KnaccHcpHKaQHH TeKCTOB.

3a'laCTYIO aHTOnOrHH H c6opHHKl1 06pa6aTb1BaIOTCR KaK caMOCTOR­

TenbHble TeKCTbl xorn 6bi no npH'IHHe CBOeH HCTOpH'leCKOH QenOCTHO­

CTH. 0AHaKO B03MO)l(H0, 'ITO qacTH aHTOnOrHH noTpe6yeTCR o6pa6aTbl­

BaTb 11 KaK caMOCTORTenbHhie o6'beKTbI H3y'leHHR. Bee no npeACTaBneH­

HbIM CTaHAapTOM o6ecne'IHBaeTCR.

TaKHM o6pa3oM, Ter <group> rrpeAHa3Ha'leH AAA ynpon_\eHHJI KOAH­

poBKH c6opttHKOB, attTonornw H QHKnoB. KaK 6bmo OTMe'letto Bbuue,

3TOT Ter MO)l(HO TaK)[(e Hcnonh30BaTb ,D.nR OTpa)l(eHHJI Toro, 'ITO Kopnyc

COCTOHT l13 nO,D.KOpnycoB.

nAA BCeX COCTaBHbIX TeKCTOB 06ll.\aR xapaKTepHCTHKa cneAyIOll.\aR:

Bee TeKCTbl, H3 KOTOpbIX OHH COCTORT, MOryT, HO He 06113aHbI HMeTb eAH­

Hoo6pa3HYIO CTPYKTYPY· EcnH Bee BHYTpeHHHe TeKCTbI KOAHpy10Tc11 npH

nOMOll.\l1 OAHOro MOAynR, He B03Hl1KaeT HHKaKOH npo6neMbl. 0,D.HaKO,

ecnH AnR HX KOAHPOBKM Tpe6y10TCR pa3Hbie MOAynM, Bee 3TH MOAynH He­

o6xOAHMO A06aBHTb B cxeMy onHcaHHJI.

3.2.l. BttyTpeHHIDI CTPYKTypa TeKcTa

OnHcaHHe co6cTBeHHO TeKcTa opraHH30BaHo KaK Ha6op TernpoBaH­

HbIX H HeTernpOBaHHbIX xapaKTepHCTHK no onpeAeneHHbIM CHTyaQHOH­

HbIM napaMeTpaM (MeTaAaHHbIM), Ka)l(AbIH H3 KOTOpbIX o6cny)l(HBaeTCR

CBOHM TeroM c ero aTpH6yTaMH:

208

LA FILOLÓGICA POR LA CAUSA

<channel> (opKrnHanbHbIH KaHan rrepeAa'IK ):\aHHbIX) - orrKCbIBa­

eT KaHan rrepe}:\a'IK ):\aHHbIX, c IlOMOIQbIO KOTOporo 6bm rronyqeH TeKCT;

):\n.R IlKCbMeHHblX ):\aHHblX B03MO)l(Hbl BapHaHTbI: rre'laTHblff TeKCT, pyKo­

IlHCb, 3neKTpOHHOe IlHCbMO; ):\nR ycTHblX ):\aHHbIX: paAHOnepe}:\a'la, Tene­

<t>oHHblff pa3rosop, 3arrHCb pa3rosopa;

@mode - yKa3bIBaeT Ha ycTHYIO HnH IlHCbMeHHYIO <l>OPMY AaHHbIX;

<constitution> - OilHCbIBaeT BHyTpeHHee CTpOeHHe TeKCTa HnH

ero qiparMeHTa c TO'IKH 3peHHR Toro, qiparMeHTapHbIH nH OH, 3aseprneH­

HbIH HT.):\.;

~ - YTO'IHReT, H3 qero COCTOHT TeKCT;

<derivation> - OilHCbIBaeT CTeneHb ayTeHTK'IHOCTH TeKCTa;

@type - OilHCbIBaeT, KaKHe H3MeHeHHR rrpeTeprren TeKCT;

<danain> - B o6IQKX qepTax OilHCbIBaeT o6CTORTenbCTBa, B KOTOpbIX

6bm peanH30BaH TeKCT, H ayAKTOpHIO, KOTopoM: OH rrpe}:\Ha3Ha'lancR: 6bm

nH TeKCT peanK30BaH B JIH'IHOH 6ece):\e HnH rrepeA rry6nHKOH, B yqe6HOH

ayAHTOpHH HnH Ha penHrK03HOM MeporrpHRTHH H T. ):\.;

@type - copTHpyeT TeKCTbl no o6CTORTenbCTBaM, rrpH KOTO­

pbIX OHH 6bmH peanH30BaHbI;

<factuality> - OilHCbIBaeT CTerreHb peanKCTH'IHOCTH CO}:\ep)l(aHHR

TeKCTa: OilHCbIBaeTCR nH B TeKCTe BblMbilllneHHblff MKp HnH peanbHbIH;

@type - pacrrpeAeJIReT TeKCTbl no CTerreHH peanHCTH'IHOCTH;

<interaction> orrHCbrnaeT H xapaKTep peqesoro aKTa, B pe3ynb-

TaTe KOToporo 6bm C03):\aH HnH BOCilpOH3Be):\eH TeKCT: 6brn nH TeKCT OT­

BeTOM, BOCKnHQaHHeM, KOMMeHTapHeM H T. ):\.;

@type yTO'IHReT cTerreHb B3aHMOAeHCTBHR Me)l(AY aKTHB­

HbIM H rraCCHBHblM yqaCTHHKaMH peqesoro B3aHMO):\eHCTBHR;

ttactive coo6IQaeT 'IHCno aKTHBHbIX yqacTHHKOB peqeso­

ro aKTa (addressors- aApecaHTOB);

~ssive - coo6IQaeT 'IHrno naccHBHblX yqacTHHKOB peqe­

soro aKTa (addresses- aApecaTos);

<preparedness> - coo6IQaeT, TeKCT IlO):\fOTOBneH HnH CilOHTaHHbIH;

@type - COAep)l(HT KnIO'leBoe cnOBO, yKa3bIBaIOIQee Ha CTe­

rreHb IlO):\fOTOBneHHOCTH TeKCTa;

<purpose> coo6IQaeT Qenb HnH KOMMYHHKaTHBHYIO <l>YHKQHIO

TeKCTa;

@type - KOHKpeTH3HpyeT 3TY Qenb.

Hy)l(HO OTMeTHTb, 'ITO HeKOTOpbre TeKCTbI, B rrepsy10 oqepeAb xy­

AO)l(eCTBeHHble, He TaK nerKO IlO):\):\aIOTCR IlOA06HOH napaMeTpH3aQHH.

209

LA FILOLÓGICA POR LA CAUSA

B TaKHX cnr1aJ1X Tery, OTBeqalOll\eMy 3a «CnopHblH» napaMeTp, np11-

nHCb1BaeTC1l coAep)l(aH11e «He np11MeHHMO» 111rn: ApyraR cppaaa c TeM )l(e

CMblCnOM.

ilOCKOnbKY CYll\eCTByeT MHO)l(eCTBO Cl1CTeM Knacc11cp11Kal\Hl1 l1 Onl1-

CaHl1Jl TeKCTOB, Ka)l(Ablili TeKCT MO)l(eT 6bITb on11caH no MHOfl1M napaMe­

TpaM. HeCMOTpR Ha CTapaHl1Jl MHOrnx KOpnyCHblX nHHfBHCTOB, TeKCTO­

noros, COJ..\HOnHHrBHCTOB, nHTepaTypoBeAOB, ell\e He HaHAeH KOMnpo­

MHCC no sonpocy 0 TOM, CKOnbKO l1 KaKHe CHCTeMbl HY)l(HO )"lffTbIBaTb.

BffAHMO, npasffnbHee scero, BMecTo Toro qT06b1 nbITaTbCR BbIBeCTff

YHHKanbHYIO c11cTeMaT11Ky TeKCTOB AnH Toro, qT06b1 onffcaTb ornffqH­

TenbHbie xapaKTepffCTHKff TeKCTa, AOCTaToqHo l1Cnonb30BaTb B pa3-

n11qHbIX KOM6HHal\HJIX KOHeqHbIH Ha6op CffTyal\ffOHHbIX napaMeTpOB,

onffCaHHbIX BbJWe, 6e3 npHBR3Kl1 K onpeAeneHHOH TeKCTonornqecKOH

CffCTeMaTffKe. 0AHOBpeMeHHO cneAyeT yqffTbIBaTb TffilbI TeKCTOB, '!T06bI

B CO'!eTaHffff c npeAnO)l(eHHbIMff 3AeCb napaMeTpaMH HaH6onee npocTO

ff aAeKsaTHO on11caTb sHyTpeHHIOIO CTPYKTYPY Ka)l(AOro TeKcTOnornqe­

cKoro T11na. TaKOH nOAXOA ffMeeT cneAyIOll\ffe aHanffTff'!ecKHe npe11My­

ll\eCTsa:

• OH 06ecneq11saeT OTHOCHTenbHO nocneAOBaTenbHOe ff eAffH006pa3-

HOe on11caHffe TeKCTOB;

• 06ecneq11saer ffHTepnpeT11pyeMb1e conocrnsneH11H pa3HbIX cppar­

MeHTOB Kopnyca;

• n03BOnHeT aHanHTHKaM C03AaBaTb l1 cpaBHHBaTb HOBbie THCTbI TeK­

CTOB B COOTBeTCTBffl1 c KOHKpeTHbIMff napaMeTpaMH, npeACTaBAA­

IOll\l1Ml1 HaH60nbWHH HHTepec;

• OH OAffHaKOBO npffMeHffM KaK K ycTHbIM, TaK l1 K nHCbMeHHbIM AaH­

HbIM.

3aqacTYJO YA06Ho C'IffTaTb, qTO cnel..\11cp11qecKffH Ha6op Teros, KOTO­

pb1e o6cny)l(HBaIOT CffTyal..\HOHHbie napaMeTpbI, cpopMHpyeT caMOCTO­

RTenbHbIH THn TeKCTa. 3TO TaK)l(e MO)l(eT OKa3aTbCR yMeCTHbIM, KOrAa

OAHH l1 TOT )l(e Ha6op xapaKTepHCTHK cpHrypffpyeT B onttcaHHJIX pa3HbIX

TeKCTOB, COCTaBnHIOll\HX Kopnyc. MHO)l(eCTBO THnOB TeKCTOB, onpeAe­

neHHbIX TaKHM o6pa30M, cne1weT C'!HTaTb oco6oH KopnycHOH CHCTeMa­

THKOH.

Oco6o cneAyeT OTMeTHTb npopa6oTKY s TEI paaMeTKff Kopny­

cos ycTHOH pe'IH. BHyTpH Tera <profileDese> MO)l(eT HaxOAHTbCH Ter

<particDesC>, KOTOpbIH o6cny)IG1BaeT AOnonHHTellbHYJO HHcpOpMal\HIO

o rosop.Rll\HX 11n11, ecn11 9TO HY)l(HO, o IIHJ..\ax, ynoMHHYTbIX HJIH o6cy)I(-

210

LA FILOLÓGICA POR LA CAUSA

AaeMbIX B nHCbMeHHOM TeKcTe. Hpt<HO OTMeTHTb, 'ITO, xoTJI ynoTpe61U1-

eTCJI TepMHH «ytJaCTHHK peqeBoro aKTa», IlOApa3yMeBaeTCJI, 'ITO cyIQe­

CTBa, Ha.n;erreHHbie fOJIOCOM B TeKCTe, Oill1Cb!BalOTCJI no TOH )f(e cxeMe,

ecrrH He oroBopeHO HHOe. Y1AeHTHcp11u;11poBaHHblH nepcOHa)f( IlbeCbl J1JIJ1

poMaHa MO)f(eT C'IHTaTbCJI IlOJIHOnpaBHblM yqacTHHKOM peqeBOro aKTa.

Ecrr11 B rna6rroH A06aBrreHbI 3JieMeHTbI MOAYIIJI namesdates ( cM. THn

«Y1MeHa, AaTbI, IIlOAH, MeCTa» ), BHYTPH Tera <particDese> MO)f(eT COAep­

)f(aTbrn IlOAp06HaJI HHcpopMaI.J;HJI 0 fOBOpJIIQeM J1JIJ1 rpynne fOBOpJIIQHX,

Hanp11Mep l1X HMeHa 11 Apyrne HHAHBHAyarrbHbie xapaKTepHCTHKH. Kor­

Aa JIH'IHOCTb fOBOpJIIQero pacn03HaHa, eMy MO)f(HO npHCBOHTb KOA, KO­

TOpblM roBopJIIQHH 6yAeT o6o3HatJaTbCJI B rr1060M KycKe KOAHpoBaHHoro

TeKcTa, HanpHMep KaK onpeAe1U1eMb1H 3rreMeHT aTp116yTa \'llo. ATp116yT

\'llo COAep)f(HT HHAHBHAyarrbHbie xapaKTepHCTHKH OAHOro HJIH HeCKOJib­

KHX yqacTHHKOB.

Ter <settingDese> Hcnorrh3yeTcJI AJUI Toro, 'IT06b1 yKa3aTb, B KaKoYI

OKpy)f(alOIQeYI o6cTaHOBKe npoHCXOAHT petJeBoYI aKT. On11caHHe oKpy­

)f(alOIQeYI o6cTaHOBKJ1 MO)f(eT 6bITb CBJ13Hb!M HeTernpoBaHHblM TeKCTOM

(KaK on11caHHe ocpopMrreHHJI cu;eHbI nepeA HaqarroM cneKTaKlUI). 0Ho )f(e

MO)f(eT 6bITb IlOAp06HblM H TernpoBaHHblM

EcrrH cpHrypHpyeT HeCKOJibKO onHcaHHH OKpy)f(alOIQeYI o6cTaHOBKH,

HCilOJib3yeTCJI HeCKOJibKO TeroB <setting>: <setting> - COAep)f(HT IlOAp06Hoe OilHCaHHe OKpy)f(alOIQeH o6cTa­

HOBKH, B KOTOpoY! npoHCXOAHT peqeBOH aKT.

EcrrH yqacTHHKH peqeBoro B3aHMOAeHCTBHJI HaxOAJITCJI B pa3HbIX

MeCTaX, TO c IlOMOIQblO cpaKyJibTaTHBHOro aTp116yTa \'llc) (peaJIH3yeMOMY

B Tere <setting>, KaK H B rr1060M Tere MeToAa att. ascribed), pa3HbIM

yqaCTHHKaM MOryT 6b!Tb npHilHCaHbl OilHCaHHJI pa3Hb!X OKpy)f(alOIQHX

06CTaHOBOK.

Ilepeq11cJieHHbie KJiaCCbl AlUI peqeBOH CHTyau;HH peaJIH3yIDTCJI C IlO­

MOIQblO CJieAylOIQHX TeroB:

<nane> (HM.II co6cTBeHHoe) - coAep)f(HT HM.II co6cTBeHHoe HIIH ero

TpaHCilOHHpOBaHHbIH aHarror;

<date> - coAep)f(HT AaTy (B rr1060M cpopMaTe);

<time> - COAep)f(HT cppa3y, yKa3blBalOIQYlO Ha BpeMJI AHJI (B rr1060M

cpopMaTe);

<locale> - COAep)f(HT KpaTKOe HeTernpoBaHHOe OilHCaHHe MeCTa,

fAe npOHCXOAHT pe'IeBOH aKT: B KOMHaTe, B pecTOpaHe, Ha CKaMeHKe

B napKe H T. A.;

211

LA FILOLÓGICA POR LA CAUSA

<activity> - COACP)t(HT KpaTKoe HeTernpoaaHHoe onHcaHHe Toro, 'ICM Y'iaCTHHK pe'IeBOfO aKTa 3aHHMaeTCJJ BO BpeMH pe'ICBOfO aKTa ( CCnJi OH 'ICM-TO 3aHHMaeTrn).

IlpH no,n;KnIO'ICHHH K rna6noHy Mo,n;ym1 namesdates cTaHOBR.TCR ,n;ocTynHbIMH ,n;ononHHTenbHbre cneQHanH3HpoaaHHbie Tern: <orgName> H <persName>.

,[{oKyMeHT, coBMeCTHMbui c <PopMaTOM TEI, MO)t(eT o6na,n;aTb He­CKOnbKHMH 3aronoBO'IHblMH TeraMH, TOnbKO ecnH OH npe,n;cTaBAAeT co-6ow Kopnyc, npe,n;cTaaneHHbIH a <PopMaTe TEI. Y1 caM Kopnyc, H ace TeK­CTbI, ero <PopMHPYJOIQHe, o6R3aHbl HMCTb 3aronoBO'IHbie Tern. Ka)t(,D;bIH Ter, nonyqaIOIQHH cneQH<l>HKaQHIO B 3arOnOBO'IHOM Tere KOpnyca, aB­TOMaTH'ICCKH pacnpocTpaHReT caoe ,n;ewcTBHe Ha Ka)t(AbIH BHyTpeHHHH TCKCT, ecnH OH TaM He nepeonpe,n;eneH. Ter, nonyqaIOIQHH CilCI.\H<l>HKaQHIO B 3aronoBO'IHOM Tere BHYTPCHHero TeKcTa, HO He BCTpe'IaIOIQHHCJJ B 3a­ronOBO'IHOM Tere acero Kopnyca, o6ecne'IHBaeT cneQH<l>HKaQHIO TOnbKO :noro BHyTpeHHero TeKcTa. EcnH Ter nonyqaeT cneQH<l>HKaQHIO H B 3aro­noBO'IHOM Tere Kopnyca, H B 3aronoBO'IHOM Tere BHyTpeHHero TeKCTa, TO CilCI.\H<l>HKaQHR Tera B 3aronOBO'IHOM Tere Kopnyca HrHOpHpyeTCR..

Bee aTpH6yTbr 3aronoao'IHbIX Teroa ae,n;yT ce611 aHanorn'IHbIM o6-pa30M. Bnaro,n;apR TaKOH CHCTeMe ,D;OCTaTO'IHO TOnbKO O,D;HH pa3 BBCCTH MeTaHH¢opMaQH.IO, o6IQyIO ,D;AA BCCX TCKCTOB KOpnyca, H ,n;o6aBAATb OT­,n;enbHO HH<l>opMaQHIO ,n;n11 Ka)t(,D;oro BHyTpeHHero TeKcTa, ecnH OHa oT­nH'IaeTcR OT o6IQero 3HaMeHaTeAA.

3.2.2. MeTaHucpopMa~HH

MeTalrn¢opMaQIDI B cTaHAapTe TEI nonyq1rna Ha3BaHHe KOHTeK­cTyanbHOH HH¢opMaQHH. IlpHMepaM11 ee cny)t(aT: B03pacT, non H reo­rpa¢H'IecKoe npoHCXO)t(ACHHe yqacTHHKOB peqeaoro aKTa, l1X COQH­anbH0-3KOHOMH'ICCKHH CTaTyc; CTOHMOCTb H ,n;aTa ny6nHKaQHH ra3eTbl; o6IQaR. TeMaTHKa HnH BblXO,D;Hbie ,n;aHHbIC KHHrn H T. n. Y1H¢opMaQHR. TaKoro poAa o6na,n;aeT nepaocTeneHHOH Ba)t(HOCTbIO ,n;AA KopnycHow nHHfBHCTHKH. 0Ha BblCTynaeT opraHH3YIOIQl1M npHHQHilOM npH C03,n;a­HHH Kopnyca (KaK, HanpHMep, B TOM cnyqae, Kor,n;a HY)t(HO npoaepHTb, 'ITO c TO'IKH 3peHIDI HCKOTopow xapaKTepHCTHKH pa3Mep BbI60pKH paB­HOMepHO npeACTaBneH BO BCeM Kopnyce HnH npe,n;CTaBneH nponopl.\HO­HanbHO 'IHCneHHOCTH <PparMeHTOB, B3R.TblX AllR. COCTaBneHHJl Kopnyca), KpHTepHeM BbI6opa ¢parMeHTOB npH IlOHCKe H npH aHan113e Kopnyca

212

LA FILOLÓGICA POR LA CAUSA

(KaK B TOM enyqae, KOfAa Tpe6yeTeJI H3yqJ1Tb eneu;m}mqeeKHe Jl3bIKO­

Bbie xapaKTepHeTHKH npHMeHHTenbHO K HeKOTOpoMy eoo6m;eeTBy HnH

IlOAMHO)l(eeTBy TeKeTOB).

3Ta HH<i>opMan;HJI AOn)l(Ha 6bITb 3a<l>HKeHpOBaHa B eOOTBeTeTBy10-

m;eM pa3Aene 3aronoBoqttoro Tera TEI. MeTaHH<i>opMan;H.11 060 Beex AO­KyMeHTax npeAeTaBneHa B OTAenbHOM <Patine e u;enb10 YA06eTBa Bhr6opa llOAMHO)l(eeTBa Kopnyea 110 011peAeneHHblM 11pH3HaKaM.

Ter MeTaonHeaHHJI AOKyMeHTa <teiHeader> HMeeT cneAy10m;He aTpH-6yThI:

1) id - YHHKanbHhrtl HAeHTH<i>HKaTOp AOKyMeHTa B Kop11yee ( o6hrqHo OH eOOTBeTeTByeT HMeHH <Patina 6e3 paeumpeHHJI; yqJ1TbIBaJI, qTQ OH eoeTaBAAeT oeHOBy AAA HAeHTH<i>HKaTOpOB enoB H npeAnO)l(eHHtl, MO)l(HO ero eoKpaTHTh AO YHHKanbHOro KOpOTKOfO HMeHH);

2) target - HM.II <Patina, B KOTOpOM HaxOAHTCJI AOKyMeHT; 3) type='text' - TJ111 onHeaHHJI, y Hae BeerAa <<text», MOryT

6bITb onHeaHHJI rpynn AOKyMeHTOB;

4) lang='ru' Jl3bIK, Ha KOTOpOM HanHeaH AOKyMeHT, y Hae BeerAa «ru», B TEI He11onh3yeTe.11 yKa3aHHe JI3bIKa no eTaHAap­

TY ISO 639 ( aTpH6yT lang 3aAaeT 3HaqeHHe no yMon•taHUIO. 3TO 3HaqeHHe MO)l(eT 6bITb nepeonpeAeneHO AAA OTAenbHOfO npeA­no)l(eHHJI HnH enoBa, eenH B pyeeKHtl TeKeT BKn10qeH <PparMeHT

Ha ApyroM R3bIKe, B TEI npeAyeMoTpeH TaK)l(e Ter <foreign> AAA J1HOJl3blqHbIX BeTaBOK).

Bee MeTaOllHeaHHe AOKyMeHTa eoeTOHT H3 eneAy10m;HX rpyn11 TeroB: 1) <fileDese> -HH<PopMan;H.11 o TeKeTe AOKyMeHTa;

2) <profileDese> -HH<i>opMal.l;HJI 0 )l(aHpe AOKyMeHTa;

3) <encodingOeso -HH<i>opMal.l;HJI 0 eTpyKType pa3MeTKJ1 AO­

KyMeHTa (nH60 eebmKa Ha eTaHAapTHyio);

4) <revisiort>esc> -HH<i>opMal.l;HJI 06 HeTOpHH MOAH<i>HKal.l;HH

AOKyMeHTa. KpoMe <fileDese> MO)l(eT 6bITb none3eH <profileDese>, KOTOpbitl eo­

Aep)l(HT HH<i>opMan;H10 06 o6m;eM Knaeee TeKeTOB, Ha11pHMep, XYAO)l(e­

eTBeHHaJI nHTepaTypa, ny6nHn;HeTHKa, yeTHa.11 peqb J1 T. n.

011HeaHJ1e <Patina <fileDese> eoeTOHT H3 cneAy10m;HX 3neMeHTOB:

1) <titleStmt> - 6H6nHorpa<PJ1qecKa.11 HH<l>opMan;J1J1 o TeKeTe;

2) <publicationStmt> - 6H6nHorpa<l>HqeeKa.11 HH<l>opMan;J1J1 06 H3-

AaHHH;

213

LA FILOLÓGICA POR LA CAUSA

3) <sourceDese> - HH<l>opMaQHJI 06 HCTO"IHHKe, H3 KOTOporo nony­

"!eHa 3neKTpOHHaJ1 BepcHJJ AOKyMettTa.

EH6nHorpa<l>H"lecKa.11 HH<l>opMaQH.11 <titleStmt> BKnJOqaeT 3ne-MeHThI:

• <title> - Ha3Batt11e;

• <author> - aBTop;

• <date> - AaTa C03AaHHJJ opHrnttanhHoro AOKyMeHTa;

• <extent> - pa3Mep AOKyMeHTa B HeKOTOpblX ycnOBHblX eAl1-

HHQax (HX THIIOnorm1 MO:>KeT 6hITb 3aAaHa B aTpH6yTe type, HO ecTeCTBeHHO C"ll1TaTb B cnoBax; HaAO c<1>opMyn11poBaTb

rrpaB11na AnR IIOAC"leTa cnoB, Harrp11Mep, MO:>KHO C"ll1TaTb cno­BOM nocneAOBaTenhHOCTb CHMBOnOB OT rrpo6ena AO npo6ena,

MO:>KHO, Hao6opOT, TOnbKO rrocneAOBaTenbHOCTH 6yKB 113 KJ1-

p11nnl1QbI-naTl1Hl1Qbl, MO:>KHO TOnbKO H3 KHp11nnHQbl, MO:>KHO

C"IHTaTb MHOrocnOBHbie eAHHHQbl, HarrpHMep, maK KaK, KaK­Hu6yOb, Hb10-filopK, opyz opyza 3a OAHO cnoBo; Y"ll1TbIBaJI, "!TO

Koprryc OQeHHBaeTCJJ, B TOM "IHCne H no An11He B cnoBax, Tpe6y­

eTCJI TO"IHOe yKa3aH11e napaMeTpa);

• <sponsor> - 3neMeHT, B KOTOpOM Mbl MO:>KeM cocnaTbCJI Ha co­

OTBeTCTBy10w;ero cnoHcopa;

• <respS~mt> - 11tt<PopMaQHJJ o qenoBeKe/nJOAJJX, Bttecumx

HHTenneKTyanbHbUi BKnaA B C03AaHHe :noro 3neKTpOHHO­

ro AOKyMeHTa (He aBTOpbl 11 cnOHCOpbl); <respStmt> 3aAaeT

HH<l>opMaQHJO c nOMOll\blO 3neMeHTOB <name> H <resp> AnR yKa3aHHJI np11pOAbl HHTenneKTyanbHOro BKnaAa, Hanp11Mep,

MbI MO:>KeM BHOCl1Tb CJOAa OTBeTcTBeHHhIX 3a PY"'HYJO pa3MeTKY

AOKyMeHTa.

t.{To KacaeTCJI Jl3blKOBbIX KOprrycoB, MeTa11tt<l>opMaQHJO MO:>KHO o6'be­

AHHHTb B o6w;eM 3aronOBO"IHOM Tere Bcero Koprryca J1nl1 A06aBHTb B 3a­

ronOBO"IHbie Tern Ka:>KAOfO <l>parMeHTa; o6'beAHHemte 3Tl1X Bap11aHTOB

TaK:>Ke B03MO>KHO.

BttyTpH 3aronoBO"!Horo Tera TEI MO>KHO Hcnonh30BaTb ew;e HeKOTO­

pb1e 3neMeHTbl, HO TOnbKO np11 ycnOBHH, "!TO OHl1 3aAaHbl B cxeMe orr11ca­

HHJJ. 3TO 3neMeHTbl, KOTOpbre no3BOnJJJOT, HanpHMep, c pa3HblX CTOpOH

oxapaKTepH30BaTb ycnOBHJI, npH KOTOpblX cosepmaeTCJI pe"leBOH aKT,

a TaK:>Ke ero <l>H3H"leCKHe oco6eHHOCTH, a TaK>Ke yqacTHHKOB peqesoro

aKTa. Il0Ao6tta.11 Htt<l>opMaQHH C03AaeTrn cneQHanbHO AnH HY:>KA Kop­

rrycttoi1: nHHrBHCTl1Kl1. }J;nR 3Toro BHYTPH Tera <profileDese>, KOTOpbIH

214

LA FILOLÓGICA POR LA CAUSA

pacnonaraeTc.11 BHYTPff 3aronosoqHoro Tera ( <TEIHeader> ), MO)f(HO

ffCnOJib30BaTb ).\OnOJIHffTellhHhie 3JieMeHTbl:

<textDesc> (onffCaHffe TeKCTa) COAep)f(ffT onffcaHffe TeKCTOB Ha

.113bIKe CffTyaQffOHHbIX napaMeTpOB;

<particDesc> (onffcaHffe yqacTHffKOB peqesoro aKTa) - onffchrnaeT

yqacTHffKOB peqesoro aKTa B TeKcTe mo6oro Tffna;

<settingOesc> (onffcaHffe ycnOBffH) - onffChIBaeT ycnosffe ffllff yc­

JIOBHJI, npH KOTOpbIX npOHCXO).\ffT peqeBOH aKT (B <f>opMe HeTernposaH­

HOfO TeKCTa mm KaK Ha6op TernpoBaHHblX xapaKTepffCTffK).

4. lIHHrBHCTH'leCKaJI pa3MeTKa KOpnyca

.H3blKOBOH Kopnyc qacTO CO}.\ep)f(ffT aHaJiffTffKO-llffHrBffCTff'leCKYIO

pa3MeTKy, ffCnOJib3yeMyIO B pa3m1qHblX llffHfBffCTffqecKJ1X ffCCJieAOBa­

Hff.llX. Cyw;ecTByeT HeCKOJihKO MexaHH3MOB, c noMOlil;hIO KOTOph1x MO)f( -

HO npeACTaBffTb npaKTJ1qeCKJ1 JII060H Tffn pa3MeTKff B CTaHAapTHOM

BffAe ffllff no llla6JIOHy, ffHAffBffAyarrhHOMY All.II 3Toro AOKyMeHTa.

fIOA llffHfBffCTffqecKOH pa3MeTKOH noApa3yMeBaeTCJI JII06aJI pa3MeT­

Ka, OCHOBaHHa.11 Ha llffHfBffCnfqecKffX xapaKTepffCTffKax TeKCTa .

.[{aHHbie llffHfBffCTffqecKOH pa3MeTKff MO)f(HO A06aBJIJITb K TeKCTO­

BbIM 3JieMeHTaM pa3HhIX yposttel1. Hanpm.1ep, KOA Knacca cnos ffllff KOA

qaCTepeqHoH npffHaAJie)f(HOCTJ1 MO)f(eT 6hITb npffB.113aH K Ka)f(AOMY CJIO­

ny (TOKetty) ffllff rpynne TOKeHOB, KOTOpa.11 MO)f(eT 6bITb Hepa3pb1BHOH

ffllff pa3pbIBHOH. TaK)f(e COOTBeTCTBYIOW:ffH KOA MO)f(eT 6hITb 3aKpenJieH

3a npeAJIO)f(eHffeM ffllff 3a CJ1HTaKcffqecKffM OTHOllleHffeM.

MexaHff3M pa3MeTKff MO)f(eT 6hITh aBTOMaTttqecKffM, pyqHhIM ffllff

aBTOMaTfflieCKJ1M c pyqHOH npaBKOH. JlerKOCTb ff TOqHOCTb, c KOTopott

MO)f(eT 6bITb aBTOMaTff3ffpOBaHa pa3MeTKa, saphffpyeTC.11 B 3aBffCHMOCTJ1

OT ypOBHJI, Ha KOTOpOM OHa Tpe6yeTCJI. J.1cnOJih3yeMblH cnoco6 pa3MeT­

KH MO)f(eT 6hITh yKa3aH B Tere <interpretation> BHYTPff Tera, onttchrna­

IOm;ero cnoco6 KOAttpoBKtt, B KOpHeBOM Tere TEIHeader. Ecnff pa3Hhie qacTff Kopnyca HY)f(HO pa3MeTffTh no pa3HhIM napaMe­

TpaM, 3TO MO)f(HO yKa3aTb, ffCil01Ih3YJI aTpff6yT decls. J.13-3a 6oJihlllffX ff3Aep)f(eK npff pacno3HaBaHttJ1 TeKcTa ff KOAff POBKe

MHOrHX xapaKTepffCTffK TeKCTa, a TaK)f(e CJIO)f(HOCTeH B o6ecneqeHffl1

eAHH006pa3HblX TeXHOJIOfffH B npttMeHeHHJ1 KO BCeM qacTJIM 60Jibllll1X

Kopnycos, KOAffPOBlil;ffKaM, B03MO)f(HO, 6yAeT YA06Ho pa3AeJiffTh Ha60-

ph1 3JieMeHTOB, KOTOph1e no).\Jie)f(aT KOAffPOBKe, Ha qeTlJipe KaTeropttff:

215

LA FILOLÓGICA POR LA CAUSA

required (o6Jl3aTenbHbie) - xapaKTepHCTHKH, OTHOCJUI~HeC.R K 3TOH

KaTeropHH, 6yAyT KOAHPOBaTbCJI npH aHanH3e K(l)t(AOfO TeKCTa, COCTaB­

MIOI.l.\ero Kopnyc;

reccmnended ()l(enaTenbHbie) - xapaKTepHCTHKH, OTHOC.SII.l.\HeCR

K 3TOH KaTeropHH, 6yAyT KOAHposaTbC.R, ecnH 3TO no3BOMIOT coo6pa­

)l(eHHR 3KOHOMHH; ecnH 3Ta xapaKTepHCTHKa npHcyTcTByeT B TeKCTe,

HO He 6bma KOAHposaHa, Ha 3TO yKa3b1saeTc11 B KOpHeBOM Tere;

optional <<PaKynhTaTHBHbie) - xapaKTepHCTHKH, OTHOCJII.l.\Hec11

K 3TOH KaTeropHH, MoryT KOAHposaTbCR, a MoryT He KOAHposaTbc11; ecnH

B COOTBeTCTBYfOI.l.\eM Tere He yKa3aHa HH<l>opMaQHJI 06 3TOH xapaKTepH­

CTHKe, TO He 3Ha'rnT, 'ITO OHa OTCYTCTByeT B TeKCTe;

proscribed (HCKnIO'leHHbie) xapaKTepHCTHKH, OTHOC1II.l.\Hec11

K 3TOH KaTeropHH, npeAHaMepeHHO He KOAHPYfOTCJI; OHH MOryT 6bITb

npeACTasneHbI KaK Hepa3Me'leHHbIH TeKCT HnH npeACTasneHbI BHYTPH

Tera <gap>, HJIH soo61.I.\e He ynoMRHYTbI, KaK 6hrnaeT 'lall.\e scero.

5. cI>opMaTbl JIJIHrBHCTH'leCKOH paaMeTKH

5.1. <l>opMambt Mop<fionozu<tec1wu pa3MemKu

CneAyeT pa3JIH'laTb <PopMaT CTPYKTYPbI 11;aHHbIX H <PopMaT Harro11He­

HHR. C TO'lKH 3peHHR CTPYKTYPbI MO)l(HO BbIIJ;enHTb TpH ocHOBHbIX crro­

co6a pa3MeTKH TeKCTa mrnrBHCTH'leCKOH HH<l>opMaQHefl:

• npocTOe 11;06as11eHHe: 3a K(l)t(AbIM CJIOBOM c11e11;yeT KpaTKOe

OTIHCaHHe ero npH3HaKOB, HanpHMep, gives_ vvz, fAe KOA

vvz 03Ha'laeT, 'ITO 3TO TpeTbe JIHQO eA.'l. (Z) 3Ha'IHMOro rna­

rona (VV) (I); • Ta6JIHQa: B K(l)t(IJ;OM CTOJI6Qe 3anHCbIBaeTCJI onpeACJICHHblff

Mop<PocHHTaKCH'lCCKHH npH3HaK (II); • 113bIK pa3MeTKH: Ha6op cpeACTB AM 3aTIHCH JIHHfBHCTH'leCKOH

HH<l>opMaQHH, o<PopMneHHbIH B BHIJ;e Jl3bIKa KJIIO'leBblX CJIOB

CO CBOHM CHHTaKCHCOM (III).

The door, which was equipped with neither bell nor knocker, was blistered and distained.

I. I1p11Mep pa3MeTKH Koprryca Associated Press Corpus (ropH30H­

TaJibHbIH <PopMaT B KOAHPOBKe yH11sepc11Tern JlaHKacTep ):

[N The_AT door_NNl ,_. [Fr [N which_DDQ NJ [V was_ VBDZ equipped_ VVN [P with_IW [N neither_LE [ bell_NNl nor_ CC

216

LA FILOLÓGICA POR LA CAUSA

knocker_NNI ] NJ PJ VJ Fr] NJ ,_, [V was_ VBDZ [blistered_ VVN and_ CC distained_ VVN] VJ ._.

II. 11pi1Mep pa3MeTKH Kopnyca Associated Press Corpus (seprn:Kanh­HhIH <PopMaT B Mop<PoCHHTaKCHtleCKOH KOJV1pOBKe ymrnepoueTa JlaH­KaCTep):

The AT [N door NNI

' ' which DDQ [Fr[N] was VBDZ [V equipped VVN with IW [P neither LE [N bell NNI [ nor cc knocker NNI ]N]P]V]Fr]N]

' was VBDZ [V blistered VVN and cc distained VVN VJ

EonbllIHHCTBO COBpeMeHHblX .Jl3bIKOB pa3MeTKH OCHOBaHO Ha <PopMa­nH3Me SGML/XML, nocKOnhKY OH o6ecne•rnsaeT B03MO:>KHOCTh .RBHoro AOKyMeHTHposaHH.R ua6opa aTpH6yTOB H pa3Aen.11eT pa3MeTKY CTPYKTY­pbr AOKyMeHTa, ero coAep:>KaHH.R H npeACTaBneHH.R nonh30BaTemo. IlpH­BeAeM npHMep pa3MeTKH B HOTa1.vm CHHTaKcHca .R3hIKa XML.

III. IlpHMep pa3MeTKH Kopnyca pyccKHX TeKCTOB (.R3hIK pa3MeTKH B Mop<PonornqecKoH KOAHPOBKe HHTepHeT-cepsHca AOT):

<?xrnl version=" 1.0" encoding= "windows-1251" ?><text><p> <s><W>3BOHHllH<ana lemma="3BOHJ1Tb" pos="f" gram="MH,HC,Hn,11cT,npm", /></W>

<W>K<ana lemma="K" pos="flPE.!VI" gram="" /></w> <W>ae11epHe <ana lemma="BE'IEPlUI" pos="C" gram="lKp,e11,11T,np,Ho" /> <ana lemma="BE'IEPHJ1H" pos="II" gram="cp,e11,Kp" /></w> <pun>.</pun></s>

<S><W> ToplKeCTBellJllllit:<ana lemma= "TO P)f(ECTBEHHbIH" pos= "II'' gram= "Mp,e11,HM,BH" /></W>

<W>ryn<ana lemma="fYJI" pos="C" gram="Mp,e11,HM,BH,Ho" /></w> <W>KOllOKOllOB <ana lemma="KOJIOKOJI" pos="C" gram="Mp,MH,p11,tto" /> <ana lemma="KOJIOKOJIOB" pos="C" gram="Mp,<j>aM,e/1,HM,011" /></W> ................. ..... .... <pun>.</pun></s></p></text> .......................... .

217

LA FILOLÓGICA POR LA CAUSA

EAHHHQeH MopcponornqecKoH paaMeTKH BbicTynaeT cnoso (Ter <w>). V1cxOAHaR cpopMa, ynoTpe6neHHaR B TeKcTe, aanHCbIBaeTcR nocne 3Toro Tera. MopcponornqecKHH paa6op cnosa aan11caH B 3neMeHTe <ana>, y KO­TOporo ecTb aTp116yTb1:

• lernna - cnosapHrui: cpopMa B sepxHeM perncTpe; • pos 'laCTb pe'IH; • gran Mopcponorw1ecK11e npH3HaKH. Ka)i(,!\Oe cnoao MO)l(eT HMeTb OAHOBpeMeHHO HeCKOJibKO napannenh­

HbIX pa36opoB, npeACTaBJieHHblX B nocne,!\OBaTeJibHOCTH 3JieMeHTOB <ana> CTocne paaperneHM HeO,!\H03Ha'IHOCTH (pyqHOrO HJIH aBTOMa­TH3HpOBaHHOfO) B BbIXOAHOM npeACTaBJieHHH OCTaeTCR o6bi'IHO TOJibKO OAHH paa6op.

OKOH'laTenhHO cTaHAapTbI HanonHeHHJJ AJIR XML-Kopnycos e1L1e He CJIO)l(JiJIMCb. CpeAH KOHKyp11py10IL1HX Apyr c ApyroM CTaHAapTOB, Ha11-6onee 3Ha'IHMbl cneAY10IL111e: EAGLES (Expert Advisory Group on Lan­guage Engineering Standards), TEI (Text Encoding Initiative), 11 XCES (Corpus Encoding Standard for XML).

npaBHJia EAGLES (Recommendations ... - 3JieKTpOHHaR BepcHR) aa­,1\alOT 061L1He npHHQHilbl C03AaHJ1R H AOKyMeHTHpOBaHHR KOpnycoB H HX MopcpocHHTaKCM'leCKOH paaMeTKH, a TaK)l(e pRA KOHKpeTHbIX perneHHH AJIR paaMeTKH onpeAeneHHbIX cnyqaes. B qacTHOCTH, OHH peKoMeHAYIOT npoBOAHTb neMMaTH3aQHIO. EAGLES TaJOKe npe,qnaraeT ,qae B03MO)l(HO­CTH AJIR xpatteHM Mopcponorn'leCKOH pa3MeTKJ1: Ka)l():\bIH npH3HaK npe,q­CTaBJieH OT):\eJibHblM aTp116yTOM (POS='W' nt.Jrber="sing"), HJIH MO)l(­HO HCilOJib30BaTb CJIO)l(HYIO Mopcponorn'leCKYIO aHHOTaQHIO, B KOTOpOH 1.111cppbi COOTBeTCTBYIOT npH3HaKaM, HanpHMep, feats="V3011141101200" 03Ha'laeT rnaron, 3rd person, singular, finite, indicative, past tense, active, main verb, non-phrasal, non-reflexive form of a verb (cn11coK peKoMeH­,qyeMbIX npH3HaKOB H l1X 3Ha'leHHH npe,!\CTaBJIReT co6ot1 'laCTb peKOMeH­,qaQHH EAGLES). 0,qHaKo npas11na EAGLES He co,qep)l(aT roTOnoro tta-6opa Teros AJIR coJ,qaHHR Kopnyca.

CyIL1eCTBY10IL1He Kopnych1, JIHHrBHCTH'leCKaR paaMeTKa KOTOpbIX oc­HOBaHa Ha SGML/XML, 11cnonb3YIOT CaMbie pa3Hbie c11cTeMbl KOAHposa­HHJI. HanpHMep, BNC 11cnoJih3yeT COIF, ornoBaHHbIH Ha TEI; American National Corpus, Croatian National Corpus 11 AP· Hcnonb3YIOT XCES; ICE (International Corpus of English), Czech National Corpus H Hungarian National Corpus ucnonh3YIOT Ha116onee rnupoKo npHMeHHMbIH cTaH­,qapT TEI.

218

LA FILOLÓGICA POR LA CAUSA

,[(JUI pyccKoro .113hIKa cTaHAapT TEI 6brn aAanrn:poBaH C. A. lllapo­BhIM H C.O.CaB'IYK (CaB'IYK 2005) H HCilOJib30BaH npH C03AaHJm Ha­QMOHaJibHOro Kopnyca pyccKoro JI3hIKa.

IlpHBeAeM ABa npHMepa pa3MeTKM B TEI Ha MopcponornqecKOM (1) H CJIOBOo6pa30BaTeJibHOM (2) ypoBHJIX:

(1) I didn't do it <w lenma="i" feats="ppl">I</w> <W> <W lenma="do" featS="wd"xlid</w> <m type="negation''>n't</m> </W> <W lenma="do'' feats="vv0''xlo</w> <w lenma="it" feats="pp3''>it</w>

(2) can[ortable <W type="adjective"> <m type="prefuc" baseform="con''>can</m> <m type="root''>fort</m> <m type="suffix''>able</m> </W>

HaM6onee pa3pa6oTaHttbn1 CTaHAapT AAA co6cTBeHHO JIHHrBMCTM­qecKoH pa3MeTKH TeKCTOB - no XCES (Ide, Romary 2002), KOTOpbIH IlJiaHHpyeTCJI npesparn:Tb B Me)l(AyttapOAHbIH CTaHAapT B pycJie npo­eKTa ISO TC37/SC4. XCES 3aAaeT a6cTpaKrny10 MemaMOAeJib, KOTOpaR o6ecne'IHBaeT cpeACTBa C03AaHH.ll scex pa3yMHbIX MOAeJieH JIMHfBHCTH­qecKMX pa3MeTOK, YAOBJieTBopRIOlQHX npaBHJiaM EAGLES. ,[(AA 3Toro onpeAeJieHbI a6CTpaKTHbie Tern y3nos <struct>" Mx npH3HaKoB <feat>. ,[(n.11 Ka)l(AOro y3na AOJI)l(ett 6hITb 3aAaH ero THn, ttanpuMep, p- level, s-level, w-level, m-level cooTBeTcTBeHHO AAA a63aQeB, npeAJIO)l(e­HHH, cnos H MopcpeM. 3To no3BOJUleT npeACTaBJIJITb MYJibTHcnosa KaK OAHY eAMHHQY attaJIH3a, ttanp11Mep, as well as B attrnMHCKOM HJIM rnaroJibI c OTAeJUieMbIMH npHCTaBKaMH, ttanpHMep, zunehmen B HeMeQKOM. Mo:>K­HO TaK)l(e npoBOAHTb AeKOMil03MQHIO OAHOfO CJIOBa B npeAenax pa3MeT­KH, HanpHMep, AJIJI zum KaK zu dem B HeMeQKOM.

B KaqecTBe OAHoro H3 CTaHAapTOB Mop<l>onornqecKoH pa3MeTKH cne­AyeT Ha3BaTb MHOfOJ13blKOBbie MopcpocHHTaKCH'leCKMe cneQH<l>MKaQHH (multilingual morphosyntactic specifications) MULTEXT-East Version 4 (http://nl.ijs.si/ME/V 4/).

219

LA FILOLÓGICA POR LA CAUSA

5.2. Cl>opr.taTl>I KOAff POBaHHJI CKHTaKOf'leCKHX OTHOmeHHH

,[{ocTaTO'IHO w11poK Ha6op Jl3bIKOB AJIJI CHHTaKCH'leCKOH pa3-

MeTKH TeKcTon. Hanp11Mep, JI3bIK ropH30HTaJibHOH 3an11c11 Kopnyca

PennTreebank OCHOBaH Ha xpaHeH1111 AepeBbeB B BHAe LISP-cn11cKoB:

(S(NP-SB (PPH-HD He)) (VP-OC (VVD-HD studied))

(NP-00 (ART-ND the) (NN-HD problen))

B TEI All.II c11HTaKc11qecKHX OTHOWeHHH HMelOTC.11 CTaHAapTHbie Tern:

• Ter <cl> Knay3a, Alli KOA11poBaHHJI cn0>KHOCO'll1HeHHbIX 11 noA­

'IHHeHHbIX npeAJIO>KeHHH, y Hero eCTb ABa aTp116yTa: type, 3aAa!O­

ll.\HH Cl1HTaKCl1'1eCKHe np113HaKH Knay3bl, H function, 3aAalOII.\11H

<l>YHK[\11!0 Knay3bl;

• Ter <phr> -rpynna cnon, auanorntJHO aTp116yT type 3aAaeT ee T11n

(HMeHHaJI, npeAJIO)f(HaJI H Ap.), H function 3aAaeT ee <l>YHKI..\11!0.

,[{nJI npeACTaBJieHHJI B TepMHHax 3aBHCHMOCTeH MO)f(HO npeAyCMO­

TpeTb cneQHaJibHbie Tern, Hanp11Mep, <depe>, KOTOpbIH HMeeT aTp116yTbl

function 11 target, nocneAHHH CCbIJiaeTCJI Ha HAeHTH<l>11KaTOp 3aBl1Cl1-

MOro CJIOBa B npeAJIO)f(eHl111.

Ilp11BeAeM np11Mep Mop<PocHHTaKc11qecKoH pa3MeTKH B TEI npeAJIO-

)f(eH~rn:

Nineteen fifty-four when I was eighteen years old <p> <cl type="finite declarative" fU1ction="independent''> <phr type='W' fU1ction="slbject"> Nineteen fifty-four <cl type=" finite relati vedeclarati ve'' function=" appositive''> l4ien <phr type='W' function="slbject">I</phr> <phr type="W" functioo="predicate''> was eighteen years old</phr> </cl> </phr> </cl>. ..

6. 3aKJIIO'leHHe

B 3aKJI10'leH11e MO)f(HO cKa3aTb, 'ITO KaK n11HrBHCT11tJeCKaJI, TaK 11 3KC­

Tpan11HrBHCTH'leCKaJI pa3MeTKl1 AOJl)f(Hbl 6a311poBaTbCH Ha HeKOTOpbIX

AOCTaTO'IHO w11p0Ko pacnpocTpaHeHHbIX 11 06ll.\enp11H.1ITbIX npHHQ11nax

on11caHHJI TeKCTOB 11 H3bIKOBbIX eA11HHQ. 3TH npHHQHilbl Ha116onee rny-

6oKO npopa60TaHbl B Me)f(AyttapOAHbIX CTaHAapTaX, 'laCTb 113 KOTOpbIX

6brna paCCMOTpeHa BbIWe.

EA11Hbie <PopMaTbI npe,LICTaBJieHHJI ,LlaHHbIX no3BO/UllOT BO MHOrHX

220

LA FILOLÓGICA POR LA CAUSA

cnr1amc HCilOnb30BaTb e,J:VtHOe nporpaMMHOe 06ecneqem1e H 06MeHH­

BaTbCR KopnycHbIMH AaHHhIMH. MmKHO rosopHTh, c OAHOH cTOpOHhI,

0 CTaHAapTH3aqHH cpopMaTOB npeACTaBneHH.R AaHHblX c TO'!KH 3peHH.R

HX HanonHeHH.R, c Apyrotf, c TO'!KH 3peHH.R 11x CTPYKTYPbI.

IlapaMeTpbI pa3MeTKH Kopnycos 11 HX 3HatieHHR AOn)l(HbI 6hITb AO­

CTaTO'!HO «eCTeCTBeHHblMH», T.e. AOn)l(Hbl COOTBeTCTBOBaTb 06ll..\enpH­

HRTblM HaytIHblM Knacc11cp11KaqH.RM. lIHHfBHCTH'!eCKOe H nporpaMMHOe

o6ecnetieHHe Kopnyc-MeHeA)l(epos AOn)l(HO nOAAep)l(HBaTb o6pa6oTKY

THilOBblX 3anpocoB H pellleHHe THilOBblX 3aAatI.

IlpHMe'laHHJI

1 TEI P4: An XML Version of TEI Guidelines. http://www.tei-c.org/P4X/ AB.htmlABTEI

2 Guidelines for Electronic Text Encoding and Interchange XML-compatible edition I ed. by C.M.Sperberg-McQueen, Lou Burnad. http://www.tei-c.org/ P4X/index.html

3 TeKylQyIO sep0110 3Toro AOKyMeHTa MO)l(HO Haiirn no aApecy. - http://www­tei.uic.edu/orgs/tei/intros/teiu5.tei 11n11 ftp://info.ox.ac.uk/pub/ota/TEI/doc/teiu5. tei

JlHTepaTypa

5apaHoB A.H. BBeAeH11e B np11Knai:1Hy10 n11Hrs11cT11Ky. M., 2007 Caa~yK C. 0. MeTaTeKCTOBaR pa3MeTKa B HaQHOHaJibHOM Kopnyce pyccKoro

R3b!Ka: 6a30Bble npHHQHnbI 11 OCHOBHbie <l>YHKQl111 // HaQHOHaJibHb!H Kopnyc pyccKoro R3bIKa: 2003-2005. Pe3yJibTaTbI 11 nepcneKTHBbI. M., 2005. C. 62-88.

Recommendations for the morphosyntactic annotation of corpora, EAG-TC-WG-MAC/R. ftp://ftp.ilc.pi.cm.it/pub/ eagles/ corpora/ annotate. ps.gz

Ide N., Romary L. Standards for Language Resources II Proceedings of Lan­guage Resources and Evaluation Conference (LREC02). Las Palmas (Spain), 2002. P. 59-65.

Guidelines for Electronic Text Encoding and Interchange I ed. by C. M. Sper­berg-McQueen, L.Burnard. [S.1.], 2001. - http://www.hcu.ox.ac.uk/TEI/P4X/ index.htmlhttp://www.hcu.ox.ac.uk/TEI/P4X/index.html

LA FILOLÓGICA POR LA CAUSA

Hay'-IHOe H3AaHHe

CTPYKTYPHMI VI CTPVIKflAllHMI flVIHfBVICTVIKA

MeJKayJoacKuu c6opHuK

BbinycK 9

PeAaKTOp JI. A. Kapnoaa

KoMnblOTepHa.H sepcTKa E. M. BopoHKOaoii

no,l{mfCaHO B ne'-!aTb 13.07.12. <I>opMaT 60x84 I I 16'

CTe'-!aTb ocpcernaR. fiyMara ocpceTH<UI.

Yrn. ne'-1. n. 20,69. THpa:>K 250 3K3. 3aKa3 2.f.C

l13AaTellbCTBO CaHKT-DeTep6yprcKoro YHHBepcineTa.

199004, C.-0eTep6ypr, B.O., 6-.R !IHHH.R, 11/21.

Ten. (812)328-96-17; cpaKc (812)328-44-22

E-mail: [email protected]

www. uni press. ru

TimorpacpH.R l13,l{aTe!IbCTBa cn6n:

199061, C.-DeTep6ypr, CpeAHHtt np., 41.

LA FILOLÓGICA POR LA CAUSA

HHTepHeT-MaraJHH

OZON.rU

"13,QATEJlbCTBO C.-nETEP6YPrCKOrc 1111111 11 111111 1111111111 1009549712

:& 7 ... C\i ... 0 N

oi c Ji

a:i

~ :s: t; :s: e :c :s: c:;

:5

! :s: Q. c :s:

:5

I u g

~ 0

~ z "' !:!2

LA FILOLÓGICA POR LA CAUSA