Towards a corpus of French Belgian Sign Language (LSFB) discourses

14
Meurant, Laurence & Aurélie Sinte. 2013. Towards a corpus of French Belgian Sign Language (LSFB) discourses. In Catherine Bolly & Liesbeth Degand (eds), Across the Line of Speech and Writing Variation. Corpora and Language in Use Proceedings 2. Louvain-la-Neuve: Presses universitaires de Louvain, 199-212. 199 Towards a corpus of French Belgian Sign Language (LSFB) discourses Laurence Meurant & Aurélie Sinte FRS-FNRS and NAmur Research College & University of Namur Abstract Recent advances in technology, tools and methodologies, which have facilitated the gathering and annotating of an extensive body of digital film footage, have opened the way to the study of discourse in sign languages (SL), which has remained relatively unexplored so far. This paper gives an account of what the previous experiences revealed about the specificities and issues of collecting SL corpora, with regard to data, participants and the annotation process. It then presents the project that has recently been initiated for the collection of a large-scale corpus of French Belgian Sign Language (LSFB) discourses. Keywords: sign language, French Belgian Sign Language, corpus linguistics, discourse 1. Introduction The linguistic study of signed languages (SLs) is considered as having emerged with the phonologic analyses of Trevoort (1953) and even more of the much better-known Stokoe (1960). From then on, the short history of SL linguistics has been closely linked to the history of computers and video recording tools, which are essential to the researcher because of the lack of a written tradition of SLs and of their specific use of the visual-gestural modality. Before the digital revolution, the tasks of collecting, storing and making accessible large amounts of video-recorded data were difficult and expensive. Researchers gathered videotapes, described and transcribed them on paper up to the 1980s, then progressively, from the 1990s, word-processing and spreadsheet softwares began to be used. Nevertheless, all these described and transcribed data, spread over many archived tapes, remained difficult to search; making systematic comparisons between similar linguistic sequences or between different signers 1 from the whole collection was a lengthy, if not impossible work. The videotaped data were frequently completed by the intuitions of native signers the researchers were in contact with. Until recently the only means of including a selection of signed examples to illustrate the analyses was to transcribe them, first by glossing the manual components of the signs in a written language and accompanying the glosses with symbols for the non-manual elements (movements of the lips, the head, the upper-body, the facial expression, etc.), and later by associating to this kind of transcription several key-pictures extracted from the videos. All these technical constraints made it difficult for the researchers to have their analyses validated by their peers. 1 The term “signer” designates an individual who uses a sign language.

Transcript of Towards a corpus of French Belgian Sign Language (LSFB) discourses

Meurant, Laurence & Aurélie Sinte. 2013. Towards a corpus of French Belgian Sign Language (LSFB) discourses.

In Catherine Bolly & Liesbeth Degand (eds), Across the Line of Speech and Writing Variation. Corpora and

Language in Use – Proceedings 2. Louvain-la-Neuve: Presses universitaires de Louvain, 199-212.

199

Towards a corpus of French Belgian Sign Language (LSFB) discourses

Laurence Meurant & Aurélie Sinte

FRS-FNRS and NAmur Research College & University of Namur

Abstract

Recent advances in technology, tools and methodologies, which have facilitated the gathering and

annotating of an extensive body of digital film footage, have opened the way to the study of discourse

in sign languages (SL), which has remained relatively unexplored so far. This paper gives an account of

what the previous experiences revealed about the specificities and issues of collecting SL corpora, with

regard to data, participants and the annotation process. It then presents the project that has recently been

initiated for the collection of a large-scale corpus of French Belgian Sign Language (LSFB) discourses.

Keywords: sign language, French Belgian Sign Language, corpus linguistics, discourse

1. Introduction

The linguistic study of signed languages (SLs) is considered as having emerged with the

phonologic analyses of Trevoort (1953) and even more of the much better-known

Stokoe (1960). From then on, the short history of SL linguistics has been closely linked

to the history of computers and video recording tools, which are essential to the

researcher because of the lack of a written tradition of SLs and of their specific use of

the visual-gestural modality.

Before the digital revolution, the tasks of collecting, storing and making accessible large

amounts of video-recorded data were difficult and expensive. Researchers gathered

videotapes, described and transcribed them on paper up to the 1980s, then progressively,

from the 1990s, word-processing and spreadsheet softwares began to be used.

Nevertheless, all these described and transcribed data, spread over many archived tapes,

remained difficult to search; making systematic comparisons between similar linguistic

sequences or between different signers1 from the whole collection was a lengthy, if not

impossible work. The videotaped data were frequently completed by the intuitions of

native signers the researchers were in contact with. Until recently the only means of

including a selection of signed examples to illustrate the analyses was to transcribe them,

first by glossing the manual components of the signs in a written language and

accompanying the glosses with symbols for the non-manual elements (movements of

the lips, the head, the upper-body, the facial expression, etc.), and later by associating

to this kind of transcription several key-pictures extracted from the videos. All these

technical constraints made it difficult for the researchers to have their analyses validated

by their peers.

1 The term “signer” designates an individual who uses a sign language.

LAURENCE MEURANT & AURÉLIE SINTE

200

The early 2000s marked a turning point in the development of knowledge about SLs.

The digital technologies made it possible to videotape and store vast amounts of data

accessibly, to annotate them in a machine-readable format and then, to automatically

search through the available database as a whole, regardless of its extent. The variety of

documented SLs (in the West and beyond) as well as the quantity of data collected has

increased significantly over the last decade2. Several research teams around the world

have begun collecting large-scale corpora of the SL of their country or region; some

have even completed the process. These pioneering projects have actually paved the

way for a new age in SL linguistics, namely the age of SL corpus and discourse

linguistics.

This paper gives an account of what the recent experiences have revealed about the

specificities and issues of collecting SL corpora (section 2), with regard to data,

participants and the annotation process. The next section (section 3) presents the project

that has recently been initiated for the collection of a large-scale corpus of French

Belgian Sign Language (LSFB) discourses.

2. Sign language corpora

The corpus linguistics of SLs is an emerging field of research and most SL corpora are

still under construction. The majority of the existing SL data are private and small

collections of video clips collected by and for SL researchers, designed for a specific

purpose or for investigating a particular aspect of the language. These data present a

small amount of native sign language signers, in some cases only deaf children of deaf

parents.

More recently, fuelled by the easy access to and processing of digital video material,

there has been a flood of data documenting SL uses, to make them exploitable for

linguistic research, and to make them accessible to researchers and to a wider public.

The first large-scale SL corpus project was collected in order to document

sociolinguistic variation of American SL (ASL). Ceil Lucas, Robert Bayley and their

team recorded, in 1995, the productions of more than 200 native or early (Afro-

American and white) signers from the major areas of the United States (Lucas et al.

2001). The European Cultural Heritage Online (ECHO) project (Case study 4: Sign

Languages) has opened this new way of SL corpus research in Europe. The corpus

consists of linguistically annotated SL data from three SLs (Sign Language of the

Netherlands – NGT, British Sign Language – BSL, and Swedish Sign Language – SSL)

collected in 2003 and 2004 (Bergman & Mesch 2004; Crasborn et al. 2004; Woll et al.

2004). This corpus is published as an open archive3. Two major corpora have since been

collected in the same vein. The Corpus NGT (2006-2008) is an online open archive

corpus including 72 hours of (partially) glossed and annotated data from 92 adult native

2 See the Sign Language Corpora Survey (R. Konrad) published by the University of Hambourg on

http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/sl-corpora.html 3 http://www.let.ru.nl/sign-lang/echo/

TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES

201

signers from all over the Netherlands (Crasborn & Zwitserlood 2008). The corpus of

Australian SL (Auslan) is part of the wider Endangered Languages Archive (2004-

2007). It includes 300 hours of data from 100 adult native signers (Johnston & Schembri

2006). The data are transcribed, glossed and partially annotated, and the corpus will be

published online in 20124. Many other large-scale corpora are currently being gathered

in order to provide the SLs concerned with a representative picture of their use. These

ongoing projects include the German Sign Language (DGS) Corpus Project (2009-

2023)5, which seeks to collect 350 to 400 hours of data from more than 300 fluent signers

and to publish about 50 hours online, but also the British Sign Language (BSL) Corpus

project6 and the CREAGEST project on French Sign Language (LSF)7. More projects

have been initiated in Ireland, China (Hong Kong), Italy and Sweden, and new projects

are expected to begin in other countries or regions. The thesis of Konrad (2009) includes

a complete survey of and substantial information on all the ongoing and finished SL

corpus projects8.

In order to be usable and exploitable for linguistic purposes, an SL corpus must be made

up of the video data, the metadata and the annotation files of each video clip. The next

subsections will introduce the specificities and the issues related to each of these

components of SL corpora.

2.1. Data

SLs illustrate the visual-gestural modality of the human capacity for language. They

involve the use of several articulators that work simultaneously and/or sequentially: the

hands, the upper body, the head, the face, the eyes, the eyebrows, the mouth, and the

cheeks. Video cameras are thus necessary to record SL samples, and even high

definition cameras in order to capture the detailed movements of the small articulators

like the eyes, the eyebrows, the mouth and the cheeks. These specificities have

implications for the corpus construction in a certain number of respects.

2.1.1 Recording conditions

As the signer’s face fully participates in the linguistic expression, the participants cannot

be recorded anonymously. Moreover, the requirements of the video recording result in

a minimal technical setting (the cameras at least, and also possible tripods, lights and

backdrops) that inevitably diminishes the signer’s spontaneity, leastways at the starting

of the recording session. The researchers must choose the best balance between the

spontaneity of the productions and the quality of the recordings, regarding their

objectives. But in any case, SL corpora do not include anonymous data captured in

spontaneous situations, which could be equivalent to anonymous spoken language data

recorded with a small sound recorder. It is therefore necessary to carefully set up the

4 http://www.auslan.org.au/about/corpus/. 5 http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/corpus.html. 6 http://www.bslcorpusproject.org/.

7 http://www.umr7023.cnrs.fr/-Realisationde-corpus-de-donnes-.html. 8 http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/sl-corpora.html.

LAURENCE MEURANT & AURÉLIE SINTE

202

recording conditions (including the place, the tasks and the participants) in order to

compensate these limitations as much as possible.

The announcement of the project to the Deaf community is a crucial stage in establishing

the trust and confidence of the potential participants. In addition to announcements on

Deaf websites and flyers, face-to-face information in Deaf clubs and other meeting

places remains indispensable. The main achieved and ongoing SL corpora that include

regional variations have been collected in different schools, deaf clubs or other familiar

places for the participants. This mobile way of working facilitates the participation of

informants living in large countries and certainly contributes to the relative naturalness

of the recording setting. In addition, researchers can try to reduce as much as possible

the recording equipment, for example in using small cameras and reduced or no lights.

But again, the researchers have to find the right compromise within the conflicting

constraints of naturalness and relevant quality of the data obtained; the team of the

Corpus NGT project, for example, has chosen not to use light systems, but they hooked

cameras up to the ceiling in order to collect views from above of each participant

(Crasborn & Zwitserlood 2008: 44-45).

2.1.2. Tasks and communication setting

The spontaneity and the quality of the data are also strongly determined by the type of

tasks used to elicit the signers’ productions. The translation tasks, which tended to be

used in the early years of SL research, were certainly easy to implement, but are now

considered to have induced interferences between the source and the target language,

and therefore constitute invalid data on the language under scrutiny. Story retelling

tasks, on the contrary, while at the same time being easy to set up, generate much more

natural productions, and avoid language interference when they are based on visual

materials such as picture or film stories. Story retelling tasks have been commonly used

in data collection. As a consequence, a substantial part of SL descriptions are based on

narrative productions, which only represents a restricted part of the signers’ productions.

If the data collection has commonly been carried out by the researcher him/herself – in

most cases a hearing researcher – using various introspection techniques (error

recognition and correction, grammaticality judgments, semantic judgments or other

judgment tasks) and elicitation tasks (free and guided composition, story retelling, video

or picture descriptions)9, it is now recognized that it is very important not to have any

hearing researcher present during the recording sessions. More generally, it has been

shown (Lucas et al. 1992) that signers are highly sensitive to the linguistic status of their

interviewer, whether he/she be hearing or even using a different SL variant (due to

geographical, age, gender or ethnicity differences). In order to control the language

contact effect, an interesting solution adopted in most of the current main corpus projects

is to compose dyads (or triads) of deaf participants and to engage them in dialogue. The

dyads (or triads) may be composed of two naïve participants, that is, individuals not

9 See Van Herreweghe & Vermeerbergen (2013) for a comprehensive overview on data elicitation

techniques and materials frequently or less frequently used in SL corpus collection.

TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES

203

related to the research team (as in the Corpus NGT project), or of one or more naïve

participants(s) guided by a deaf researcher or assistant in a semi-directed interview (as

in parts of the CREAGEST project). This conversational condition has the advantage of

allowing various types of tasks and types of discourse: the partners can be led to produce

narratives, but also descriptive, argumentative, explicative discourses in a quite informal

and natural recording situation.

2.1.3. Participant selection

Selecting the participants, of course, plays an important role in the relevance of a corpus.

But in the context of an SL corpus, this issue takes on a particular dimension. The

concept of native-like signing has to be handled with even more precautions than in the

context of spoken languages (Davies 2003), given the peculiar transmission pattern of

SLs. Only a small minority of signers (5 to 10 percent) has grown up in a signing family.

And an even smaller proportion of them have parents who are native signers themselves

(Vermeerbergen & Van Herreweghe 2013). It follows that native signers with native

parents represent isolated cases in the landscape of SL users. Therefore, an SL corpus

that included only (second-generation) deaf-of-deaf signers would be representative of

only a very small proportion of the users of any SL. In the case of small SL communities,

such a corpus would be constrained to include a very limited number of informants

(Costello et al. 2008: 78). And since in all SL communities there are many more non-

native signers than native ones, a corpus of native signers could not be expected to be

free of any production influenced by the SL use of non-native signers, who are,

statistically, the most frequent interlocutors of native signers. A common option taken

by the researchers is to select native and near-native signers. A near-native signer may

be defined as a signer who has acquired SL at an early age (3-7, depending on the case),

who has been educated in a school for the deaf, who uses the studied SL daily, or who

is a long-time member of the Deaf community (Van Herreweghe & Vermeerbergen

2013).

2.2. Metadata

The specific issues presented under 2.1.3. regarding the selection of informants clearly

shows the relevance of collecting metadata on the filmed informants and the recording

sessions. In most SL corpora, metadata are encoded with the IMDI editor developed at

the Max Planck Institute for Psycholinguistics in Nijmegen10.

The metadata related to the informants may include information about their region, sex

and age at the time of recording, their hearing status, parents’ and siblings’ hearing

status, their position within the brothers and sisters (if any), and the type of hearing aid

they use (if any), their age of exposure to SL and the place and context of this exposure,

the type of school they attended, the primary language of communication within the

family, etc. The metadata related to the recording sessions may include the type of

discourse produced (narrative, descriptive, explicative, argumentative, etc.), the

10 The IMDI Metadata editor, browser and organizer tools are now gathered in a unique tool called

Arbil, which is also freely accessible on the MPI page: http://www.lat-mpi.eu/tools/arbil/.

LAURENCE MEURANT & AURÉLIE SINTE

204

communication setting (monologue, dialogue, with naïve participants or with a member

of the research team, etc.), the tasks and the materials used for elicitation, the degree of

formality, the place, etc.

While this information is crucial for purposes of research, it should not be published, in

order to protect the privacy of the participants. For the Corpus NGT project, public

access to the majority of the metadata is limited, and no mention or reference is made

to the names or the initials of the signers. Only researchers who sign a license form

declaring not to publish information on individuals are granted access to personal

information about the family background and signing experience of the participants. But

these precautions alone do not resolve the issue of privacy related to the collection of

SL data, since the video clips cannot be anonymous and the signers’ productions may

reveal a lot of personal information or statements (for a more in depth reflexion on this

topic, see Crasborn 2008).

2.3. Annotation

Since they are unwritten languages, SL corpora share several features with spoken

language corpora. SLs involve face-to-face communication, and their discourses share

the (relative) spontaneity of oral communication in spoken languages and the

impossibility of post-hoc editing. From a more technical point of view, SL corpora, like

spoken language corpora, need to be annotated in order to be machine-readable. The

fact that SLs use multiple and simultaneous articulators explains that SL research has

fostered the integration of digital video into corpus annotation software. Several tools

are available for the annotation and analysis of sign language data, notably

SyncWRITER (Hanke 2001), SignStreamTM (Neidle et al. 2001), ELAN11 and iLex

(Hanke & Storz 2008). But annotating SL data is not identical to annotating spoken

language data. Again, the specificities are related to the visual-gestural modality of SLs

and the subsequent impossibility to write them using an existing graphic system that can

be machine-readable12.

2.3.1. Glosses

Annotating each of the single elements of an SL production implies to gloss them with

words from a written language; that is to say, glossing them with words from another

language. But, as is the case in all language pairs, the lexicon of the SL does not

correspond exactly with the lexicon of the written language used in the glosses. One

single sign may correspond to various English or French words, depending on the

context of its use, and the opposite can be true, namely different signs of an SL may

correspond to one single word in English or in French. For example, there are various

11 ELAN is the tool that will be used for the annotation of the LSFB corpus project.

http://tla.mpi.nl/tools/tla-tools/elan/elan-description/. 12 The “Sign Writing” system developed by V. Sutton in 1974 (http://www.signwriting.org/), which is

unevenly distributed within the Deaf community, is not a machine-readable system that could be used

to transcribe SL data.

TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES

205

ways to express the meaning ‘before’ in LSFB (see Figure 113). The three variants

presented in Figure 1 are related to the use of different kinds of representation of time

in space (or “time-lines”, Sinte 2010), involving different types of (deictic or anaphoric)

reference points (Meurant 2008). But neither English nor French encodes the same

distinction. Therefore, using the gloss BEFORE14 (or AVANT in French) to all three of

these LSFB signs would amount to ignoring the difference between them and render

any further automatic research on one of them impossible. This would make the (time-

consuming) glossing work completely useless.

Figure 1. Three different signs in LSFB for the English meaning ‘before’

2.3.2. ID-glosses and lemmatisation

In order for an SL corpus to be a machine-readable and searchable corpus rather than a

simple archive, a unique gloss must be attributed to each identified sign type, called

“ID-gloss” (Johnston 2010). The ID-gloss is then the written word (in English, French,

etc.) that designates the same sign through all its occurrences (tokens) within the corpus,

regardless of the meaning of this sign in each particular context. For example, the ID-

gloss DRAINED (ÉPUISÉ in French glossing) within an LSFB corpus may correspond in

one context to the meaning ‘exhausted’ (‘fatigué’ in French) but in another context to

the meaning ‘unavailable’ (‘indisponible’ in French). The consistent use of a single ID-

gloss signifies that this meaning variation is not lexically distinctive in LSFB. Matching

a sign articulation with its ID-gloss or lemma often implies to decide on the status of

non-manual components and variations of a sign. For example, the third sign of figure

1 should be associated with the same ID-gloss (say BEFORE_3) than another

occurrence that would not show puffed cheeks and would then not involve the same

meaning of intensity (‘before’ vs. ‘long before’). Both occurrences should be identified

as a unique lexical sign undergoing non-manual modification.

This ID-glossing or lemmatisation process is essential to ensure the consistency of data

glossed by multiple researchers or even by the same researcher in different moments. It

is only in this way that it will be possible to search through a large number of annotation

files to know how a given sign behaves in the variety of available contexts.

Lemmatisation should ideally precede the annotation process, so that when glossing the

corpus files, one might refer to the pre-existing identification of signs. However, in the

case of SL data, annotation and lemmatisation need to be conducted in parallel, since

the lexicographic analysis of SLs – because of the lack of writing system, of written

13 Figure 1 presents three different forms. But an in-progress PhD study on time in LSFB reveals that

even more possibilities do exist (Sinte forthcoming). 14 By convention in SL linguistics, the sign glosses are written with small caps words.

Sign 1 meaning ‘before’ Sign 2 meaning ‘before’ Sign 3 meaning ‘before’

LAURENCE MEURANT & AURÉLIE SINTE

206

tradition and therefore of community-wide standards – is still to be done and is

dependent on the corpus analysis work. In other words, the lemmatization process

presupposes the existence of a representative corpus of the language, which needs to be

lemmatized in order to be searchable. The recent developments of the links between SL

annotation tools and lexicon databases (e.g. the link between ELAN and Lexus15, or the

integration of both tools within iLex – meaning “integrated lexicon”) can certainly be

seen as facilitating the parallel and consistent development of the lemmatization and the

annotation of a studied SL16.

3. The LSFB corpus project

Linguistic research on LSFB has emerged very recently as an academic discipline. The

first doctoral research on this language was presented in 2006 (Meurant 2008). But the

tardiness with which LSFB research emerged could be compensated for somewhat by

the fact it has emerged onto the scientific scene at a fruitful moment, namely when the

recent advances in technology, tools and methodologies have facilitated the gathering

of extensive bodies of digital film footage, and have thus opened the way for discourse

analyses. The existing studies of LSFB have so far been mostly devoted to micro-textual

aspects of the language structure (at the level of morphology and syntax)17. Corpus-

based linguistics applied to LSFB is predicted to provide an interesting and new

interaction between theory and practice, giving rise to useful tools for bilingual

education18 and LSFB-French interpretation in the French Community of Belgium19.

This section will present the main specificities of the LSFB corpus project and the first

research axes it will support.

3.1. Discourse and variation corpus

The aim of the LSFB corpus project20 is to gather a representative sample of this

language’s current uses, including the variety of its uses. The variety will concern

discourse genres, signers and interlocutors, and registers.

15 http://www.lat-mpi.eu/tools/lexus/. 16 The 5th workshop on the Representation and processing of Sign Languages, held during the LREC

2012, 8th ELRA Conference on Language Resources and Evaluation (Istanbul, May 2012), has been

devoted to “Interaction between Corpus and Lexicon” (http://www.sign-lang.uni-hamburg.de/lrec2012/cfp.htlm). 17 In 2010 and 2011, we collected 20 hours of LSFB data made of semi-directed and spontaneous

dialogs between deaf signers. These data have already made more macro-textual analyses possible. 18 Since 2000, the association “École et Surdité” (School and Deafness) has implemented in Namur a

bilingual (LSFB and French) teaching structure. Deaf pupils are grouped within hearing classes and are

given the classes in LSFB, including the lessons on French reading and writing. LSFB is the language

used for all face-to-face communication and written French is the language used for all written

situations. École et Surdité and the University of Namur have been collaborating since the beginning of

this project (Meurant 2012). 19 Currently, there is no LSFB-French interpretation training in Belgium. This has many negative

consequences for the Deaf community, including for the education of Deaf children and young people. 20 A bilingual (LSFB with French subtitles) film has been created to inform the Deaf community about

the project, its aims and its issues. It is available at http://www.corpus-lsfb.be.

TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES

207

3.1.1. Genres

As mentioned above, narrative productions have occupied an important place in SL

research so far. We are however constantly reminded, mainly by our experience of

working with the teachers of the bilingual classes of “Ecole et Surdité”, that the

linguistic structures present within narratives do not have the same frequency, and are

not achieved in exactly the same way as within other discourse genres. In order to know

more about these differences, we wanted to provide the LSFB corpus with a variety of

genres, including – besides a number of narratives – descriptive, explicative and

argumentative discourse productions.

3.1.2. Signers and interlocutors

The aim of the project is to collect the productions of approximately 70 signers, for a

total of approximately 280 hours of recordings. It has been reported above that native

signers represent a very small proportion of any SL community. It is of course also true

for the LSFB community. This exceptional nature probably contributes to the

idealization of the supposed native-like “visual signing”, especially when reference is

essentially made to story telling productions. If SL grammar involves linguistic use of

space and multiple articulators, the real (quantitative and qualitative) use of modality-

specific devices (such as highly iconic structures) by native-signers vs. non-native

signers remains comparatively unexplored. A first contrastive look at signing data from

BSL native vs. non-native pairs has brought to light certain nuances regarding the

frequency and the extent of spatial and iconic devices’ used by native signers (Rentelis

2009). This is why we plan to include native, near-native (see 2.1.3.) and late signers

within the corpus. We also would like to include CODAs (hearing Children Of Deaf

Adults), as a particular group of hearing native signers. This variety of signers’ profiles

will make it possible to investigate the impact of the conditions of LSFB acquisition

(age, school and social context) as well as the effect of the contact between LSFB and

French in the signing productions. Participants will be selected in different regions of

the French Community of Belgium for their relevance regarding LSFB variants

(Brussels – Uccle, Berchem, Woluwé, Mons, Namur, Liège). Different ages (18 and up)

as well as both sexes will be represented.

In accordance to the objectives mentioned above, the metadata of each video clip will

necessary include information on sign competence (e.g. age of acquisition, type of use,

regional attachment), bilingual status (e.g. writing and reading competences, type of use

of each language, hearing status) and education (e.g. deaf/oral/hearing school). Thanks

to this information, it should then be possible to investigate the variations and the

language contact effects that may distinguish the different groups of signers through the

variety of discourse genres and the variety of interactions (monologues, dialogues,

group interaction, public presentations, etc.). As far as possible, for the elicited dialog

tasks within the SL lab, the dyads will be composed homogeneously and

heterogeneously regarding age and/or age of acquisition of LSFB: for example, one

native signer will first interact with another native-signer, but also with a late-signer.

LAURENCE MEURANT & AURÉLIE SINTE

208

3.1.3. Registers

The LSFB corpus project will also include register variations, from the most formal to

the most informal, including spontaneous as well as prepared speech productions. We

expect to foster register variation by varying the recording places and contexts

(conference rooms, the SL lab at the university, schools, social places, etc.), as well as

varying the interaction settings (monologues, dialogues, group interactions, training

interaction, public presentations, etc.). For example, the recordings made out of the SL

lab will include prepared testimonies presented in public, spontaneous comments

immediately following a conference, spontaneous discussions at the breaks of a training

session and trainer-group interactions during SL teachers’ trainings. The elicitation

material for the SL lab sessions will include prepared and non-prepared tasks as story-

telling, dialogs about informants’ lives, tastes and experiences, role plays on hot topics

related to the Deaf community, conversations about daily life as well as more reflexive

exercises (e.g. play rules or mathematical principles to explain).

3.2. Research avenues

The LSFB corpus project aims to provide the scientific community – as well as the

teachers and the whole Deaf community21 – with a referential corpus designed to be a

resource for a wide variety of future research projects. The research axes currently under

investigation or scheduled for the near future are all connected with the concerns and

needs stemming from the everyday practices of the LSFB-French interpreters and the

bilingual teachers we are in regular contact with. Four of them are briefly introduced

below as illustrations.

3.2.1. Discursive structures

The conventions governing the organisation of discourse vary from language to

language and from culture to culture. The visual-gestural nature of SLs seems to

contribute to the specificity of their structure (Janzen 2005). The LSFB corpus will allow

us to extend the knowledge in the field by including types of discourse that have

heretofore rarely been studied (descriptive, explicative, argumentative, as well as the

narrative type most frequently used in literature). This will involve identifying

discursive tags and markers and describing their uses. We expect to find, for example,

that breaking or changing the gaze, as well as nods and head, act as discourse markers.

One underexplored phenomenon that we expect to find plays a role in the marking of

discourse structure is a sort of bracketing structure based on the repetition of a sign that

frames a (short or long) sequence of signs. The produced structure can be figured as an

“A – (b c d) – A” structure, where A is the repeated and bracketing sign and (b c d)

represents the bracketed sign sequence. An example is given in figure 2: in this case, the

bracketing structure isolates the term RETOUCH from the metalinguistic (explicative)

comment on this term.

21 The data (videos and annotations) will be stored by the University of Namur. They will be published

at the end of the project as open access data with specific copyright rules. The practical details of this

publication and accessibility are still under study.

TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES

209

It is important to underline that no specific task will be built in order to elicit this or that

specific structure. The instructions given to the informants will only be aimed at

fostering one specific discourse genre (e.g. can you tell me…, can you explain him…,

can you describe…, can you play the role of…), but the expression itself will not be

controlled by ad hoc tasks.

Figure 2. Bracketing structure

3.2.2. Voices and perspectives

At the level of syntax and at least within narrative discourses, LSFB makes a frequent

use of constructions alternating between external and internal perspectives on

statements. These structures are made up of the combination of so-called “constructed

action” (Metzger 1995) or “role shift” (Padden 1986) structures, which express an

internal perspective on the action, and “classifier” constructions (Schembri 2003),

which express an external perspective. Figure 3 gives an example of this kind of

alternation, within a “scale alternation” structure (Meurant 2008). The LSFB corpus will

make possible to investigate whether these voice and perspective alternations, very

frequent in story telling, occur with the same frequency and in the same forms within

explicative, descriptive and argumentative discourses.

Figure 3. Alternation of perspectives within a “scale alternation” structure

3.2.3. Paraphrasing

As a result of having been excluded from the educational and intellectual spheres for a

century (1880-1980) and considering the difficulty deaf people have, even today, in

gaining access to higher education, SLs have been stigmatized for their lack of

specialized terminology. It is essential not to associate the problems rooted in socio-

linguistic shortcomings with any lack within the LSs’ linguistic systems. But it is also

vital (especially for teachers and interpreters) to expose those devices which are specific

to SLs and to their visual-gestural character of them that compensate for or explain the

comparative lack of specialized terminology. Investigating the LSFB corpus, we will

seek to inventory the devices (fingerspelling, iconic rephrasing, mouthings22, etc.) used

22 The term ‘mouthings’ refers to the mouth movements that accompany the hands movements in SL

productions (van de Sande & Crasborn 2009).

RETOUCH SO-CALLED SAME CLEAN-FACE CLEAN R. (E.T.O.U.C.) H. RETOUCH

(…)

‘To retouch (photographs), what is written R.E.T.O.U.C.H., namely clean the face’

‘He comes up playing music ’

PLAY-MUSIC a-MOVE PLAY-MUSIC

LAURENCE MEURANT & AURÉLIE SINTE

210

to refer to objects, events and concepts which do not have an established lexical sign in

LSFB, to describe the ways in which these devices are integrated into the discourse, and

to study the variation of their use from one type of discourse to another.

3.2.4. Fluency and disfluency markers

Little attention has been given to fluency per se among SLs (Lupton 1998; Nicodemus

2011). Yet the criterion of fluency vs. non-fluency is frequently referred to when it

comes to defining the profile of participants for SL studies or corpora (Van Herreweghe

& Vermeerbergen 2013), or the linguistic proficiency of an interpreter or of the relatives

and teachers who are the linguistic models for deaf children. The LSFB corpus will be

called upon in order to investigate the features (at the level of prosody, lexicon and

syntax) and combinations of features that contribute to fluency and disfluency in LSFB.

These non-exclusive research axes all profile forthcoming studies on LSFB as

discourse-oriented and sensitive to language variation.

4. Conclusion

Collecting a large-scale corpus of LSFB is indispensable to the pursuit of research on

this language. Knowledge of LSFB, as well as the quality and the validity of this

knowledge, its relevance to the education of the Deaf, and LSFB-French interpretation

training directly depend on the success of this project. The LSFB corpus project will

greatly benefit from the previous experience of the other SL research teams with respect

to the data collection, the management of metadata and the annotation process.

Acknowledgements

The LSFB corpus project is supported by an Incentive Grant for Scientific Research

(2012-2014) from the F.R.S.-FNRS (Foundation for Scientific Research in Brussels and

Wallonia, Belgium) and by the University of Namur, Belgium.

References

Bergman, Brita & Johanna Mesch. 2004. ECHO data set for Swedish Sign Language (SSL) Stockholm: Department of Linguistics, University of Stockholm. http://www.let.ru.nl/sign-lang/echo

Costello, Brendan, Javier Fernandez & Alazne Landa. 2008. The non-(existent) native signer: sign language research in a small deaf population. In Ronice Muller de Quadros (ed.), Sign Languages: Spinning and Unraveling the Past, Present and Future. TISLR 9: forty five papers and three posters from the 9th Theoretical Issues in Sign Language Research Conference, Florianopolis, Brazil, December 2006. Florianopolis, Brazil: Editora Arara Azul, 77-94. http://www.editora-arara-azul.com.br/EstudosSurdos.php

Crasborn, Onno. 2008. Open access to sign language corpora. In Onno Crasborn, Thomas Hanke, Eleni Efthimiou, Inge Zwitserlood & Ernst Thoutenhoofd (eds), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages. Paris: ELDA, 33-38. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W25_Proceedings.pdf

TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES

211

Crasborn, Onno, Els van der Kooij, Annika Nonhebel & Wim Emmerik. 2004. ECHO data set for Sign Language of the Netherlands (NGT). Nijmegen: Department of Linguistics, Radboud University Nijmegen. http://www.let.ru.nl/sign-lang/echo

Crasborn, Onno & Inge Zwitserlood. 2008. The Corpus NGT: an online corpus for professionals and laymen. In Onno Crasborn, Thomas Hanke, Eleni Efthimiou, Inge Zwitserlood & Ernst Thoutenhoofd (eds), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages. Paris: ELDA, 44-49. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W25_Proceedings.pdf

Davies, Alan. 2003. The Native Speaker: Myth and Reality. Clevedon: Multilingual Matters.

Hanke, Thomas. 2001. Sign language transcription with syncWRITER. Sign Language and Linguistics 4, 267-275.

Hanke, Thomas & Jakob Storz. 2008. iLex — A database tool integrating sign language corpus linguistics and sign language lexicography. Paper presented to the 3rd Workshop on the Representation and Processing of Sign Languages at International Conference on Language Resources and Evaluation, Marrakech, Morocco. http://www.sign-lang.uni-hamburg.de/ilex/lrec2008_hanke.pdf

Janzen, Terry. 2005. Interpretation and language use: ASL and English. In Terry Janzen (ed.), Topics in Signed Language Interpreting. Amsterdam: John Benjamins, 69-105.

Johnston, Trevor. 2010. From archive to corpus: transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics 15(1), 106-131.

Johnston, Trevor & Adam Schembri. 2006. Issues in the Creation of a Digital Archive of a Signed language. In Linda Barwick & Nicholas Thieberger (eds), Sustainable Data from Digital Fieldwork. Sydney: University of Sydney, 7-16.

Konrad, Reiner. 2009. Die lexikalische Struktur der DGS im Spiegel empirischer Fachgebärdenlexikographie. Zur Integration der Ikonizität in ein korpusbasiertes Lexikonmodell. Unpublished dissertation, Universität Hamburg.

Lucas, Ceil & Clayton Valli. 1992. Language Contact in the American Deaf Community. New York, NY: Academic Press, Inc.

Lucas, Ceil, Robert Bayley & Clayton Valli. 2001. Sociolinguistic variation in American Sign Language. Washington, DC: Gallaudet University Press.

Lupton, Linda. 1998. Fluency in American sign language. Journal of deaf studies and deaf education 3(4), 320-328.

Metzger, Melanie. 1995. Constructed dialogue and constructed action in American Sign Language. In Ceil Lucas (ed.), Sociolinguistics in Deaf Communities. Washington DC: Gallaudet University Press, 255-271.

Meurant, Laurence. 2008. Le regard en langue des signes. Anaphore en langue des signes de Belgique francophone (LSFB). Morphologie, syntaxe, énonciation. Rennes, Namur: Presses Universitaires de Rennes, Presses Universitaires de Namur.

Meurant, Laurence. 2012. In search of the ideal partnership between sign linguistics research and a bilingual teaching project. The case of Namur, Belgium. In Lorraine Leeson & Myriam Vermeerbergen (eds), Working with the Deaf Community. Education, Mental Health and Interpreting. Dublin: Interesource Group (Ireland) Limited.

Neidle, Carol, Stan Sclaroff & Vassilis Athitsos. 2001. SignStream™: a tool for linguistic and computer vision research on visual-gestural language data. Behavior Research Methods, Instruments, and Computers 33, 311-320.

Nicodemus, Brenda. 2011. Disfluencies in American Sign Language and English. Paper presented at the 33rd Annual Conference of the German Linguistic Society (DGfS). Theme session 6: Sign language discourse, Georg August University, Göttingen.

LAURENCE MEURANT & AURÉLIE SINTE

212

Padden, Carol. 1986. Verbs and role-shifting in American Sign Language. In Carol Padden (ed.), Proceedings of the Fourth National Symposium on Sign Language Research and Teaching. Silver Spring, MD: National Association of the Deaf, 44-57.

Rentelis, Ramas. 2009. Processing of British Sign language in native and non-native Deaf signers. Paper presented at the Colloque International sur les Langues des Signes (CILS), Namur: University of Namur.

Schembri, Adam. 2003. Rethinking “classifiers” in signed languages. In Karen Emmorey (ed.), Perspectives on classifier constructions in sign languages. Mahwah, New Jersey: Lawrence Erlbaum Associates, 3-34.

Sinte, Aurélie. 2010. Français - Langue des signes française de Belgique (LSFB): quelques éléments d'analyse contrastive des temps verbaux. Cahiers de l'AFLS 16.

Sinte, Aurélie. forthcoming. Expression(s) de temps en LSFB. Références, ancrages énonciatifs, structures discursives. PhD Thesis, University of Namur.

Stokoe, William C. 1960. Sign Language stucture: an outline of the visual communication systems of the American deaf. Studies in Linguistics: Occasional papers 8.

Trevoort, Bernard. 1953. Structurele analyse van visueel taalgebruik binnen een groep dove kinderen. Amsterdam: Noord-Hollandsche Uitgevers Maatschappij.

van de Sande, Inge & Onno Crasborn. 2009. Lexically bound mouth actions in Sign Language of the Netherlands: a comparison between different registers and age groups. Linguistics in the Netherlands 26, 78-90.

Van Herreweghe, Mieke & Vermeerbergen, Myriam (2012). Data Collection. In Roland Pfau, Markus Steinbach & Bencie Woll (eds) Sign Language. An International Handbook. Handbooks of Linguistics and Communication Science (HSK) series, n°37. Berlin: Mouton de Gruyter, 1023-1045.

Woll, Bencie, Rachel Sutton-Spence & Daffyd Waters. 2004. ECHO data set for British Sign Language (BSL). London: Department of Language and Communication Science, City University. http://www.let.ru.nl/sign-lang/echo