Technological approaches to linguistic documentation and meta-documentation
Transcript of Technological approaches to linguistic documentation and meta-documentation
12/2/2013
1
Technological Approaches to Technological Approaches to
Linguistic Documentation Linguistic Documentation
andand
Metadocumentation Metadocumentation
Pankaj Dwivedi
Gulab Chand
Somdev Kar
Indian Institute of Technology Ropar
Rupnagar, Punjab 140001
India
2 December 2013 1
Language Documentation Language Documentation
Principles and methods used for the
recording and analysis of primary
language and cultural materials, and
metadata about them.
Unlike before, with the revolution in the
area of information technologies, it is now
possible to maintain organized and long-
lasting linguistic and cultural records.
2 December 2013 2
Why documenting languages is Why documenting languages is
IMPORTANT?IMPORTANT?Half of the world’s language may no
longer to continue to exist after a few
more generations as they are not being
learnt by children as first languages
(Austin & Sallabank, 2011).
Crystal (2002) claims that the rate of
language disappearance is as high as two
languages each month.
2 December 2013 3
How ?How ?
� Creating Dictionaries
� Preparing Language Teaching Materials
� Archiving
� Language Corpora (Written & Spoken)
2 December 2013 4
12/2/2013
2
What is needed?What is needed?
Lot of language data and latest technology
Language data: Text, Audio and Video
Technology: software and tools which can
handle the language data and platforms
wherein these data can be effectively made
use of.
2 December 2013 5
What do we need?What do we need?
� Language data ( No Problem)
� Platforms (will see later on)
� Latest TOOLS and SOFTWARE for:
1. Recording and Capturing
2. Analysis
3. Archiving
4. Mobilization
2 December 2013 6
ONE MOMENT!!!ONE MOMENT!!!
Is ‘Latest’ the best?
or
Old is gold?
CHOOSE CAREFULLY !!!
2 December 2013 7
Is ‘TECHNOLOGY’ adoption Is ‘TECHNOLOGY’ adoption
always good? always good?
� Languages may live on without orthography.
But no language will be able to function as
administrative language in a modern society
without a developed language technology
(Trosterud, 2006).
� Technology changes quickly and an uncritical
adoption of new tools and technologies might
compromise with long-term sustainability,
portability, usability and compatibility with
other platforms (Bird & Simons, 2003).
2 December 2013 8
12/2/2013
3
Striking a balanceStriking a balance
� Portability: operating systems, formats,
software, encodings
� Sustainability: long-term preservation
and usefulness
� Maintenance and Distribution: finances,
space, tools and reach
� Access and protocols: paid or free, open
or closed, research or business, full or
partial
2 December 2013 9
Capturing Audio MediaCapturing Audio Media
2 December 2013 10
Why or Why not WAV?Why or Why not WAV?
2 December 2013 11
Capturing Video MediaCapturing Video Media
2 December 2013 12
� CODECS
12/2/2013
4
CONTAINERSCONTAINERS
2 December 2013 13
Capturing Digital TextCapturing Digital Text
� Character Encoding: Unicode,
ASCII, Windows/ANSI, Big5, Latin
5 etc.
� Data Encoding: XML, SGML,
MSWord etc.
� File Encoding: plain-text, PDF,
MSWord etc.
2 December 2013 14
Digital text: An overviewDigital text: An overview
2 December 2013 15
Analysis tools Analysis tools
� Transcription
� Annotation
� Translation
�Metadata Management
2 December 2013 16
12/2/2013
5
Popular ToolsPopular Tools
2 December 2013 17
Metadata Management Metadata Management
� Cataloguing: title, speakers, collectors, time
and place, language name etc.
� Descriptive: information about content,
relationship to other content etc.
� Structural: structures and patterns
� Technical: description of formats, encoding,
required tools and software
� Administrative: work log, access protocol etc.
(Nathan &Austin, 2004)
2 December 2013 18
Platforms Platforms
1. Online Language Archives:
Examples:OLAC, ANLA, ELAR, CLA, The Language Archive, PARADISEC etc.
2. Social Media: Facebook, Twitter, Blogs,
etc.
Examples: ‘Indigenous Tweets’ and ‘Facebook in your language’ by Prof. Kevin
Scannell
2 December 2013 19
Conclusion Conclusion
In the generation when the rate of languagedeath is at its peak, if we choose to usemoribund technologies to create and preservelanguage data, when technologies die, uniqueheritage is also lost or encrypted (Bird &Simons, 2003).
Wemust keep in mind:
Purpose, Presentation, Portability
and
Preservation 2 December 2013 20
12/2/2013
6
ReferencesReferences
� Austin, P., & Sallabank, J. (Eds.) (2011). The Cambridge handbook of endangered languages. Cambridge University Press
� Bird, S., & Simons, G. (2003). Seven dimensions of portability for language documentation and description. Language, 79(3), pp. 557-582
� Crystal, D. (2002). Language death. Cambridge University Press.
� Nathan, D., & Austin, P. (2004). Reconceiving metadata: language documentation through thick and thin. Language documentation and description, 2, 179-187.
2 December 2013 21
� Trosterud, T. (2006). Grammatically based
language technology for minority languages.
TRENDS IN LINGUISTICS STUDIES AND
MONOGRAPHS, 175, 293.
2 December 2013 22
Thank You!
Questions and Feedback.
2 December 2013 23