What is a sign language corpus?
Transcript of What is a sign language corpus?
What is a sign language corpus?
Adam Schembri Jordan Fenlon, Kearsy Cormier, Ramas Rentelis, Rose Stamp, Bencie Woll &
Trevor Johnston
Overview • What is a sign language corpus? • Why build one? • And how? • What can you do with a corpus? • Future direcDons
Sign language corpora
DGS BSL
Auslan
Also: Irish SL Swedish SL And others currently being collected
(NGT: Dutch SL)
(German SL)
(Australian SL)
What is a ‘corpus’?
• A collection of language data which is – representative of the language community – machine-readable: can be searched by
computer – fixed in size – can act as a standard reference
Why do we need a sign language corpus of BSL?
• DOCUMENTATION – To address concerns in British Deaf community about BSL variation
and change. Heritage forms of BSL are not always passed on to the younger generation, and need to be documented.
• DESCRIPTION – Need for more empirical work on BSL grammar, to build on Deuchar
(1984), Kyle & Woll (1985), Brennan (1990) and Sutton-Spence & Woll (1999).
– Before 2014, only one dictionary organized along linguistic principles (Brien, 1992), and no lexical database for researchers.
• APPLICATION – To provide an evidence-base for for BSL teaching and BSL/English
interpreter training
Aims of the BSL Corpus Project • to create an on-line, open-access corpus of annotated BSL
digital video data that will become a shared, peer-reviewable resource and standard reference for BSL researchers and the deaf community
• to conduct corpus-based investigations of sociolinguistic
variation and change and lexical frequency
• Project 1 timeline: January 2008-June 2011 • Project 2 timeline: January 2013-June 2014 • Project 3 underway
How did we do it?
• 8 cities • At least 30 people for
each city = 249 signers
• All 8 cities have a
strong deaf community and have (or have had) a residential Deaf school
Who did we film?
• Deaf native or near-native signer (most learnt to sign before 7 years of age)
• Must have lived in the region for 10 years or more • Balanced for age (16-35, 36-50, 51-65, 66+) • Mixed for language background (roughly 25% native
signers) • Also balanced for gender and ethnicity
• However, issues in data collection force us to be flexible in applying this criteria
How did we find everyone?
• Deaf community fieldworkers (one to each region) recruited 30 Deaf people that match project criteria
• Fieldworkers were members of the deaf community themselves having grown up in the area or having lived there a long time
• Filmed over 2-4 visits • No hearing people present during filming
The filming session • Signers were filmed in pairs and often matched according to their age
group • Pairs were of the same or different gender • 1 high definition video camera(s) focused on each participant, 1 on the
pair • Pairs filmed in front of a blue background screen with additional lighting • Participants were asked to wear plain colored clothing and were seated in
chairs without arms • Few long-term partners/spouses filmed together as possible
Annotation
• Annotations (mainly glosses) completed to date are linked to specific studies on sociolinguistic variation and lexical frequency
– Phonological variation study: 2110 tokens annotated for sign gloss and handshape of the target, preceding and following signs.
– Lexical variation study: signs for countries, colors, numbers and place-names = 7332 glosses.
– Lexical frequency study (approx. 500 signs from 50 parDcipants in Bristol & Birmingham = approximately 25,000 sign tokens).
– GrammaDcal variaDon study: 1680 direcDonal indicaDng verb sign tokens coded for a range of factors, plus addiDonal 25,000 sign tokens from London and Manchester.
Why does annotation take so long?
• The corpus cannot be searched unless we use some system for annotation.
• We cannot rely on English translations as the meanings of individual signs varies in context (see EXCITED below), and variation in the lexicon means that many meanings may have many BSL equivalents (e.g., ‘green’ in English has many translations in BSL).
• No standard dictionary or lexical database of British Sign Language containing all the signs in the language so we had to begin to build our own using the corpus data.
• Each unique sign is assigned an ID gloss which is in turn added to the project’s lexical database along with all its associated keywords
– EXCITED (ID gloss): interest, interesting, excite, exciting, motivate, motivation, enthusiasm, eager, eagerness, etc. (associated keywords)
Research questions
– Is there any correlation between the sign variants used and social factors (e.g., men vs. women, Glasgow vs. London, old vs. young, native signer vs. early learner signer, BSL teacher vs. others etc.)?
– Is there any evidence of dialect levelling taking place in BSL?
• “…a process whereby differences between
regional varieties are reduced, features which make varieties distinctive disappear, and new features emerge and are adopted by speakers over a wide geographical area.” Williams & Kerswill (1999)
Lexical elicitation task • Number signs 1-20
• Signs for colours: - BROWN - GREEN - GREY - PURPLE - YELLOW
• Signs for countries: - AMERICA - BRITAIN - CHINA - FRANCE - GERMANY - INDIA - ITALY - IRELAND
AMERICA
GREEN
AMERICA
GREEN
• Signs for UK place names:
- BELFAST - BIRMINGHAM - BRISTOL - CARDIFF - GLASGOW - LONDON - MANCHESTER - NEWCASTLE
Coding of sign variants Colour, country and number signs were coded with
the following information: Dependent Variable: • Traditional or non-traditional sign variants 1. Age (16-39, 40-59, 60+) 2. Gender (male vs. female) 3. Language background (parents Deaf or hearing) 4. School location (local or non-local school) 5. Social class (working or middle) 6. Semantic category (number, country or color) For number sign data only: 1. Ethnicity (White, Asian, Afro-Caribbean, etc.) 2. Teacher of BSL
Rbrul Results
• Results from 7332 tokens • Four significant factors predict the use of non-traditional signs:
– Age: 40-59, 60+ favoured traditional regional variants, 16-39 disfavoured them
– School location: those educated in local schools favoured traditional regional variants, those from on-local schools disfavoured them
– Language background: those with Deaf parents favoured traditional signs, those with hearing parents disfavoured them
– Semantic category of the sign : signs for countries are changing fastest, followed by numbers and lastly colours
• Gender & social class were not significant factors in the full analysis
• For number sign data: teaching experience & ethnicity were not significant factors
Phonological variation and change in number signs
• We found a significant correlation between
age, language background and gender and the use of two- versus one-handed variants of number signs in BSL.
• Older signers, those with Deaf parents, and males used more two-handed variants, whereas younger signers, those with hearing parents and females used fewer.
UK place names: In-group/out-group effect
In-group/out-group effect for the following UK place names: • Belfast • Glasgow • Manchester • Newcastle • Cardiff • Bristol • Birmingham
e.g. Birmingham
Bristol
Cardiff
Elicitation task vs. conversational data
• 371 tokens from the conversational data were analysed
• 78 tokens (21%) were not the same sign variant as in the lexical elicitation task
• 61 tokens (of 78) were non-traditional sign variants e.g. GREEN
Summary • In this study, we found some evidence of declining use of tradiDonal regional signs in BSL.
• Older signers, those educated at local schools, and those with Deaf parents favoured tradiDonal regional variants.
• Younger signers, those educated elsewhere, and those with hearing parents favoured newer variants.
• The study reveals that some lexical variables show stability over signers' lifeDmes, while others show changes
Future studies
• Fingerspelling -‐B-‐ • In a recent study of 453 examples of this manual le^er in the Auslan corpus, the two citaDon forms were the least frequent in the data.
Future studies • A study of mouthings and mouth
gestures in Auslan Corpus in 17,000 tokens from 38 signers aged 15 to 80
• 57% of all signs occurred with mouthing, 22% with mouth gestures, 21% with no mouth acDon
• Lots of variaDon – individual use of mouthing
ranging from 6% to 84% – fingerspelling, nouns, and
number signs most with mouthing
– classifier signs, interjecDons and negators most with mouth gestures
– mulD-‐channel signs did not always occur with mouth gestures, but more work needed
Future directions
– Expected further 150,000 sign tokens to be annotated over 5 years if current funding applicaDons are successful.
– In the future, we hope to be able have 200,000 sign tokens, from different genres.
Lexical database: SignBank
• The lexical database created for the BSLCP lexical frequency study (which stores all ID glosses, movie clips showing each sign, and related English keywords) was adapted into BSL SignBank, an online dictionary
• This is the first sign language dictionary based on linguistic/lexicographic principles to be developed based on corpus data
The BSL Corpus online • Casual viewing of the open-access corpus data
(narrative and lexical elicitation) is now available via www.bslcorpusproject.org/data
• Annotations are now available for download. • Translations will become available in the future
as well in co-operation with Heriot-Watt University.
• Thanks to the following researchers whose work influenced our research design: Trevor Johnston (Australia), Onno Crasborn (The Netherlands), Ceil Lucas (USA), McKee & Kennedy (New Zealand)
• Thanks to the project co-investigators (Margaret Deuchar, Frances Elton, Donall O’Baoill, Rachel Sutton-Spence, Graham Turner, Bencie Woll) & Deaf Community Advisory Group members (Linda Day, Clark Denmark, Helen Foulkes, Melinda Napier, Tessa Padden, Gary Quinn, Kate Rowley & Lorna Allsop)
• Thanks to Sally Reynolds, Avril Hepner, Carolyn Nabarro, Dawn Marshall, Evelyn McFarland, Jackie Parker, Jeff Brattan-Wilson, Jenny Wilkins, Mark Nelson, Melinda Napier, Mischa Cooke, Sarah Lawrence and Rose Stamp
• Thanks to the British Deaf community & all the participants in the BSL corpus project
Acknowledgements