What is a sign language corpus?

39
What is a sign language corpus? Adam Schembri Jordan Fenlon, Kearsy Cormier, Ramas Rentelis, Rose Stamp, Bencie Woll & Trevor Johnston

Transcript of What is a sign language corpus?

What  is  a  sign  language  corpus?  

Adam  Schembri  Jordan  Fenlon,  Kearsy  Cormier,  Ramas  Rentelis,  Rose  Stamp,  Bencie  Woll  &  

Trevor  Johnston    

Overview  •  What  is  a  sign  language  corpus?    •  Why  build  one?    •  And  how?  •  What  can  you  do  with  a  corpus?  •  Future  direcDons  

Sign  language  corpora  

DGS BSL

Auslan

Also: Irish SL Swedish SL And others currently being collected

(NGT: Dutch SL)

(German SL)

(Australian SL)

What is a ‘corpus’?

•  A collection of language data which is –  representative of the language community – machine-readable: can be searched by

computer –  fixed in size – can act as a standard reference

Why do we need a sign language corpus of BSL?

•  DOCUMENTATION –  To address concerns in British Deaf community about BSL variation

and change. Heritage forms of BSL are not always passed on to the younger generation, and need to be documented.

•  DESCRIPTION –  Need for more empirical work on BSL grammar, to build on Deuchar

(1984), Kyle & Woll (1985), Brennan (1990) and Sutton-Spence & Woll (1999).

–  Before 2014, only one dictionary organized along linguistic principles (Brien, 1992), and no lexical database for researchers.

•  APPLICATION –  To provide an evidence-base for for BSL teaching and BSL/English

interpreter training

Aims of the BSL Corpus Project •  to create an on-line, open-access corpus of annotated BSL

digital video data that will become a shared, peer-reviewable resource and standard reference for BSL researchers and the deaf community

•  to conduct corpus-based investigations of sociolinguistic

variation and change and lexical frequency

•  Project 1 timeline: January 2008-June 2011 •  Project 2 timeline: January 2013-June 2014 •  Project 3 underway

How did we do it?

•  Cross-­‐insDtuDonal  co-­‐operaDon  

How did we do it?

•  8 cities •  At least 30 people for

each city = 249 signers

•  All 8 cities have a

strong deaf community and have (or have had) a residential Deaf school

Who did we film?

•  Deaf native or near-native signer (most learnt to sign before 7 years of age)

•  Must have lived in the region for 10 years or more •  Balanced for age (16-35, 36-50, 51-65, 66+) •  Mixed for language background (roughly 25% native

signers) •  Also balanced for gender and ethnicity

•  However, issues in data collection force us to be flexible in applying this criteria

How did we find everyone?

•  Deaf community fieldworkers (one to each region) recruited 30 Deaf people that match project criteria

•  Fieldworkers were members of the deaf community themselves having grown up in the area or having lived there a long time

•  Filmed over 2-4 visits •  No hearing people present during filming

The filming session •  Signers were filmed in pairs and often matched according to their age

group •  Pairs were of the same or different gender •  1 high definition video camera(s) focused on each participant, 1 on the

pair •  Pairs filmed in front of a blue background screen with additional lighting •  Participants were asked to wear plain colored clothing and were seated in

chairs without arms •  Few long-term partners/spouses filmed together as possible

What did we film?

1. 5 minute story (warm-up)

2. 30 minutes free conversation

3. 20 minutes interview

4. Vocabulary task

Annotation

•  Annotations (mainly glosses) completed to date are linked to specific studies on sociolinguistic variation and lexical frequency

–  Phonological variation study: 2110 tokens annotated for sign gloss and handshape of the target, preceding and following signs.

–  Lexical variation study: signs for countries, colors, numbers and place-names = 7332 glosses.

–  Lexical  frequency  study  (approx.  500  signs  from  50  parDcipants  in  Bristol  &  Birmingham  =  approximately  25,000  sign  tokens).  

–  GrammaDcal  variaDon  study:  1680  direcDonal  indicaDng  verb  sign  tokens  coded  for  a  range  of  factors,  plus  addiDonal  25,000  sign  tokens  from  London  and  Manchester.

Why does annotation take so long?

•  The corpus cannot be searched unless we use some system for annotation.

•  We cannot rely on English translations as the meanings of individual signs varies in context (see EXCITED below), and variation in the lexicon means that many meanings may have many BSL equivalents (e.g., ‘green’ in English has many translations in BSL).

•  No standard dictionary or lexical database of British Sign Language containing all the signs in the language so we had to begin to build our own using the corpus data.

•  Each unique sign is assigned an ID gloss which is in turn added to the project’s lexical database along with all its associated keywords

–  EXCITED (ID gloss): interest, interesting, excite, exciting, motivate, motivation, enthusiasm, eager, eagerness, etc. (associated keywords)

What are we finding so far?

London

Regional variation in BSL

e.g. SIX

Bristol

Manchester

Research questions

–  Is there any correlation between the sign variants used and social factors (e.g., men vs. women, Glasgow vs. London, old vs. young, native signer vs. early learner signer, BSL teacher vs. others etc.)?

–  Is there any evidence of dialect levelling taking place in BSL?

•  “…a process whereby differences between

regional varieties are reduced, features which make varieties distinctive disappear, and new features emerge and are adopted by speakers over a wide geographical area.” Williams & Kerswill (1999)

Lexical elicitation task •  Number signs 1-20

•  Signs for colours: -  BROWN -  GREEN -  GREY -  PURPLE -  YELLOW

•  Signs for countries: - AMERICA -  BRITAIN -  CHINA -  FRANCE -  GERMANY -  INDIA -  ITALY -  IRELAND

AMERICA

GREEN

AMERICA

GREEN

•  Signs for UK place names:

- BELFAST -  BIRMINGHAM -  BRISTOL -  CARDIFF -  GLASGOW -  LONDON -  MANCHESTER -  NEWCASTLE

     

Coding of sign variants Colour, country and number signs were coded with

the following information: Dependent Variable: •  Traditional or non-traditional sign variants 1.  Age (16-39, 40-59, 60+) 2.  Gender (male vs. female) 3.  Language background (parents Deaf or hearing) 4.  School location (local or non-local school) 5.  Social class (working or middle) 6.  Semantic category (number, country or color) For number sign data only: 1.  Ethnicity (White, Asian, Afro-Caribbean, etc.) 2.  Teacher of BSL

Rbrul Results

•  Results from 7332 tokens •  Four significant factors predict the use of non-traditional signs:

–  Age: 40-59, 60+ favoured traditional regional variants, 16-39 disfavoured them

–  School location: those educated in local schools favoured traditional regional variants, those from on-local schools disfavoured them

–  Language background: those with Deaf parents favoured traditional signs, those with hearing parents disfavoured them

–  Semantic category of the sign : signs for countries are changing fastest, followed by numbers and lastly colours

•  Gender & social class were not significant factors in the full analysis

•  For number sign data: teaching experience & ethnicity were not significant factors

Age: language change? •  Apparent time hypothesis (Bailey, 2002)

Phonological variation and change in number signs

•  We found a significant correlation between

age, language background and gender and the use of two- versus one-handed variants of number signs in BSL.

•  Older signers, those with Deaf parents, and males used more two-handed variants, whereas younger signers, those with hearing parents and females used fewer.

UK place names: In-group/out-group effect

In-group/out-group effect for the following UK place names: •  Belfast •  Glasgow •  Manchester •  Newcastle •  Cardiff •  Bristol •  Birmingham

e.g. Birmingham

Bristol

Cardiff

Elicitation task vs. conversational data

•  371 tokens from the conversational data were analysed

•  78 tokens (21%) were not the same sign variant as in the lexical elicitation task

•  61 tokens (of 78) were non-traditional sign variants e.g. GREEN

Summary  •  In  this  study,  we  found  some  evidence  of  declining  use  of  tradiDonal  regional  signs  in  BSL.  

•  Older  signers,  those  educated  at  local  schools,  and  those  with  Deaf  parents  favoured  tradiDonal  regional  variants.  

•  Younger  signers,  those  educated  elsewhere,  and  those  with  hearing  parents  favoured  newer  variants.  

•  The  study  reveals  that  some  lexical  variables  show  stability  over  signers'  lifeDmes,  while  others  show  changes  

Future  studies  

•  Fingerspelling  -­‐B-­‐  •  In  a  recent  study  of  453  examples  of  this  manual  le^er  in  the  Auslan  corpus,  the  two  citaDon  forms  were  the  least  frequent  in  the  data.  

 

Future  studies  •  A  study  of  mouthings  and  mouth  

gestures  in  Auslan  Corpus  in  17,000  tokens  from  38  signers  aged  15  to  80  

•  57%  of  all  signs  occurred  with  mouthing,  22%  with  mouth  gestures,  21%  with  no  mouth  acDon  

•  Lots  of  variaDon  –  individual  use  of  mouthing  

ranging  from  6%  to  84%  –  fingerspelling,  nouns,  and  

number  signs  most  with  mouthing  

–  classifier  signs,  interjecDons  and  negators  most  with  mouth  gestures  

–  mulD-­‐channel  signs  did  not  always  occur  with  mouth  gestures,  but  more  work  needed  

Future directions

– Expected  further  150,000  sign  tokens  to  be  annotated  over  5  years  if  current  funding  applicaDons  are  successful.  

–  In  the  future,  we  hope  to  be  able  have  200,000  sign  tokens,  from  different  genres.  

Lexical database: SignBank

•  The lexical database created for the BSLCP lexical frequency study (which stores all ID glosses, movie clips showing each sign, and related English keywords) was adapted into BSL SignBank, an online dictionary

•  This is the first sign language dictionary based on linguistic/lexicographic principles to be developed based on corpus data

The BSL Corpus online •  Casual viewing of the open-access corpus data

(narrative and lexical elicitation) is now available via www.bslcorpusproject.org/data

•  Annotations are now available for download. •  Translations will become available in the future

as well in co-operation with Heriot-Watt University.

•  Thanks to the following researchers whose work influenced our research design: Trevor Johnston (Australia), Onno Crasborn (The Netherlands), Ceil Lucas (USA), McKee & Kennedy (New Zealand)

•  Thanks to the project co-investigators (Margaret Deuchar, Frances Elton, Donall O’Baoill, Rachel Sutton-Spence, Graham Turner, Bencie Woll) & Deaf Community Advisory Group members (Linda Day, Clark Denmark, Helen Foulkes, Melinda Napier, Tessa Padden, Gary Quinn, Kate Rowley & Lorna Allsop)

•  Thanks to Sally Reynolds, Avril Hepner, Carolyn Nabarro, Dawn Marshall, Evelyn McFarland, Jackie Parker, Jeff Brattan-Wilson, Jenny Wilkins, Mark Nelson, Melinda Napier, Mischa Cooke, Sarah Lawrence and Rose Stamp

•  Thanks to the British Deaf community & all the participants in the BSL corpus project

Acknowledgements