Technological approaches to linguistic documentation and meta-documentation

6
12/2/2013 1 Technological Approaches to Technological Approaches to Linguistic Documentation Linguistic Documentation and and Metadocumentation Metadocumentation Pankaj Dwivedi Gulab Chand Somdev Kar Indian Institute of Technology Ropar Rupnagar, Punjab 140001 India 2 December 2013 1 Language Documentation Language Documentation Principles and methods used for the recording and analysis of primary language and cultural materials, and metadata about them. Unlike before, with the revolution in the area of information technologies, it is now possible to maintain organized and long- lasting linguistic and cultural records. 2 December 2013 2 Why documenting languages is Why documenting languages is IMPORTANT? IMPORTANT? Half of the world’s language may no longer to continue to exist after a few more generations as they are not being learnt by children as first languages (Austin & Sallabank, 2011). Crystal (2002) claims that the rate of language disappearance is as high as two languages each month. 2 December 2013 3 How ? How ? Creating Dictionaries Preparing Language Teaching Materials Archiving Language Corpora (Written & Spoken) 2 December 2013 4

Transcript of Technological approaches to linguistic documentation and meta-documentation

12/2/2013

1

Technological Approaches to Technological Approaches to

Linguistic Documentation Linguistic Documentation

andand

Metadocumentation Metadocumentation

Pankaj Dwivedi

Gulab Chand

Somdev Kar

Indian Institute of Technology Ropar

Rupnagar, Punjab 140001

India

2 December 2013 1

Language Documentation Language Documentation

Principles and methods used for the

recording and analysis of primary

language and cultural materials, and

metadata about them.

Unlike before, with the revolution in the

area of information technologies, it is now

possible to maintain organized and long-

lasting linguistic and cultural records.

2 December 2013 2

Why documenting languages is Why documenting languages is

IMPORTANT?IMPORTANT?Half of the world’s language may no

longer to continue to exist after a few

more generations as they are not being

learnt by children as first languages

(Austin & Sallabank, 2011).

Crystal (2002) claims that the rate of

language disappearance is as high as two

languages each month.

2 December 2013 3

How ?How ?

� Creating Dictionaries

� Preparing Language Teaching Materials

� Archiving

� Language Corpora (Written & Spoken)

2 December 2013 4

12/2/2013

2

What is needed?What is needed?

Lot of language data and latest technology

Language data: Text, Audio and Video

Technology: software and tools which can

handle the language data and platforms

wherein these data can be effectively made

use of.

2 December 2013 5

What do we need?What do we need?

� Language data ( No Problem)

� Platforms (will see later on)

� Latest TOOLS and SOFTWARE for:

1. Recording and Capturing

2. Analysis

3. Archiving

4. Mobilization

2 December 2013 6

ONE MOMENT!!!ONE MOMENT!!!

Is ‘Latest’ the best?

or

Old is gold?

CHOOSE CAREFULLY !!!

2 December 2013 7

Is ‘TECHNOLOGY’ adoption Is ‘TECHNOLOGY’ adoption

always good? always good?

� Languages may live on without orthography.

But no language will be able to function as

administrative language in a modern society

without a developed language technology

(Trosterud, 2006).

� Technology changes quickly and an uncritical

adoption of new tools and technologies might

compromise with long-term sustainability,

portability, usability and compatibility with

other platforms (Bird & Simons, 2003).

2 December 2013 8

12/2/2013

3

Striking a balanceStriking a balance

� Portability: operating systems, formats,

software, encodings

� Sustainability: long-term preservation

and usefulness

� Maintenance and Distribution: finances,

space, tools and reach

� Access and protocols: paid or free, open

or closed, research or business, full or

partial

2 December 2013 9

Capturing Audio MediaCapturing Audio Media

2 December 2013 10

Why or Why not WAV?Why or Why not WAV?

2 December 2013 11

Capturing Video MediaCapturing Video Media

2 December 2013 12

� CODECS

12/2/2013

4

CONTAINERSCONTAINERS

2 December 2013 13

Capturing Digital TextCapturing Digital Text

� Character Encoding: Unicode,

ASCII, Windows/ANSI, Big5, Latin

5 etc.

� Data Encoding: XML, SGML,

MSWord etc.

� File Encoding: plain-text, PDF,

MSWord etc.

2 December 2013 14

Digital text: An overviewDigital text: An overview

2 December 2013 15

Analysis tools Analysis tools

� Transcription

� Annotation

� Translation

�Metadata Management

2 December 2013 16

12/2/2013

5

Popular ToolsPopular Tools

2 December 2013 17

Metadata Management Metadata Management

� Cataloguing: title, speakers, collectors, time

and place, language name etc.

� Descriptive: information about content,

relationship to other content etc.

� Structural: structures and patterns

� Technical: description of formats, encoding,

required tools and software

� Administrative: work log, access protocol etc.

(Nathan &Austin, 2004)

2 December 2013 18

Platforms Platforms

1. Online Language Archives:

Examples:OLAC, ANLA, ELAR, CLA, The Language Archive, PARADISEC etc.

2. Social Media: Facebook, Twitter, Blogs,

etc.

Examples: ‘Indigenous Tweets’ and ‘Facebook in your language’ by Prof. Kevin

Scannell

2 December 2013 19

Conclusion Conclusion

In the generation when the rate of languagedeath is at its peak, if we choose to usemoribund technologies to create and preservelanguage data, when technologies die, uniqueheritage is also lost or encrypted (Bird &Simons, 2003).

Wemust keep in mind:

Purpose, Presentation, Portability

and

Preservation 2 December 2013 20

12/2/2013

6

ReferencesReferences

� Austin, P., & Sallabank, J. (Eds.) (2011). The Cambridge handbook of endangered languages. Cambridge University Press

� Bird, S., & Simons, G. (2003). Seven dimensions of portability for language documentation and description. Language, 79(3), pp. 557-582

� Crystal, D. (2002). Language death. Cambridge University Press.

� Nathan, D., & Austin, P. (2004). Reconceiving metadata: language documentation through thick and thin. Language documentation and description, 2, 179-187.

2 December 2013 21

� Trosterud, T. (2006). Grammatically based

language technology for minority languages.

TRENDS IN LINGUISTICS STUDIES AND

MONOGRAPHS, 175, 293.

2 December 2013 22

Thank You!

Questions and Feedback.

2 December 2013 23