Towards a corpus of French Belgian Sign Language (LSFB) discourses
Transcript of Towards a corpus of French Belgian Sign Language (LSFB) discourses
Meurant, Laurence & Aurélie Sinte. 2013. Towards a corpus of French Belgian Sign Language (LSFB) discourses.
In Catherine Bolly & Liesbeth Degand (eds), Across the Line of Speech and Writing Variation. Corpora and
Language in Use – Proceedings 2. Louvain-la-Neuve: Presses universitaires de Louvain, 199-212.
199
Towards a corpus of French Belgian Sign Language (LSFB) discourses
Laurence Meurant & Aurélie Sinte
FRS-FNRS and NAmur Research College & University of Namur
Abstract
Recent advances in technology, tools and methodologies, which have facilitated the gathering and
annotating of an extensive body of digital film footage, have opened the way to the study of discourse
in sign languages (SL), which has remained relatively unexplored so far. This paper gives an account of
what the previous experiences revealed about the specificities and issues of collecting SL corpora, with
regard to data, participants and the annotation process. It then presents the project that has recently been
initiated for the collection of a large-scale corpus of French Belgian Sign Language (LSFB) discourses.
Keywords: sign language, French Belgian Sign Language, corpus linguistics, discourse
1. Introduction
The linguistic study of signed languages (SLs) is considered as having emerged with the
phonologic analyses of Trevoort (1953) and even more of the much better-known
Stokoe (1960). From then on, the short history of SL linguistics has been closely linked
to the history of computers and video recording tools, which are essential to the
researcher because of the lack of a written tradition of SLs and of their specific use of
the visual-gestural modality.
Before the digital revolution, the tasks of collecting, storing and making accessible large
amounts of video-recorded data were difficult and expensive. Researchers gathered
videotapes, described and transcribed them on paper up to the 1980s, then progressively,
from the 1990s, word-processing and spreadsheet softwares began to be used.
Nevertheless, all these described and transcribed data, spread over many archived tapes,
remained difficult to search; making systematic comparisons between similar linguistic
sequences or between different signers1 from the whole collection was a lengthy, if not
impossible work. The videotaped data were frequently completed by the intuitions of
native signers the researchers were in contact with. Until recently the only means of
including a selection of signed examples to illustrate the analyses was to transcribe them,
first by glossing the manual components of the signs in a written language and
accompanying the glosses with symbols for the non-manual elements (movements of
the lips, the head, the upper-body, the facial expression, etc.), and later by associating
to this kind of transcription several key-pictures extracted from the videos. All these
technical constraints made it difficult for the researchers to have their analyses validated
by their peers.
1 The term “signer” designates an individual who uses a sign language.
LAURENCE MEURANT & AURÉLIE SINTE
200
The early 2000s marked a turning point in the development of knowledge about SLs.
The digital technologies made it possible to videotape and store vast amounts of data
accessibly, to annotate them in a machine-readable format and then, to automatically
search through the available database as a whole, regardless of its extent. The variety of
documented SLs (in the West and beyond) as well as the quantity of data collected has
increased significantly over the last decade2. Several research teams around the world
have begun collecting large-scale corpora of the SL of their country or region; some
have even completed the process. These pioneering projects have actually paved the
way for a new age in SL linguistics, namely the age of SL corpus and discourse
linguistics.
This paper gives an account of what the recent experiences have revealed about the
specificities and issues of collecting SL corpora (section 2), with regard to data,
participants and the annotation process. The next section (section 3) presents the project
that has recently been initiated for the collection of a large-scale corpus of French
Belgian Sign Language (LSFB) discourses.
2. Sign language corpora
The corpus linguistics of SLs is an emerging field of research and most SL corpora are
still under construction. The majority of the existing SL data are private and small
collections of video clips collected by and for SL researchers, designed for a specific
purpose or for investigating a particular aspect of the language. These data present a
small amount of native sign language signers, in some cases only deaf children of deaf
parents.
More recently, fuelled by the easy access to and processing of digital video material,
there has been a flood of data documenting SL uses, to make them exploitable for
linguistic research, and to make them accessible to researchers and to a wider public.
The first large-scale SL corpus project was collected in order to document
sociolinguistic variation of American SL (ASL). Ceil Lucas, Robert Bayley and their
team recorded, in 1995, the productions of more than 200 native or early (Afro-
American and white) signers from the major areas of the United States (Lucas et al.
2001). The European Cultural Heritage Online (ECHO) project (Case study 4: Sign
Languages) has opened this new way of SL corpus research in Europe. The corpus
consists of linguistically annotated SL data from three SLs (Sign Language of the
Netherlands – NGT, British Sign Language – BSL, and Swedish Sign Language – SSL)
collected in 2003 and 2004 (Bergman & Mesch 2004; Crasborn et al. 2004; Woll et al.
2004). This corpus is published as an open archive3. Two major corpora have since been
collected in the same vein. The Corpus NGT (2006-2008) is an online open archive
corpus including 72 hours of (partially) glossed and annotated data from 92 adult native
2 See the Sign Language Corpora Survey (R. Konrad) published by the University of Hambourg on
http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/sl-corpora.html 3 http://www.let.ru.nl/sign-lang/echo/
TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES
201
signers from all over the Netherlands (Crasborn & Zwitserlood 2008). The corpus of
Australian SL (Auslan) is part of the wider Endangered Languages Archive (2004-
2007). It includes 300 hours of data from 100 adult native signers (Johnston & Schembri
2006). The data are transcribed, glossed and partially annotated, and the corpus will be
published online in 20124. Many other large-scale corpora are currently being gathered
in order to provide the SLs concerned with a representative picture of their use. These
ongoing projects include the German Sign Language (DGS) Corpus Project (2009-
2023)5, which seeks to collect 350 to 400 hours of data from more than 300 fluent signers
and to publish about 50 hours online, but also the British Sign Language (BSL) Corpus
project6 and the CREAGEST project on French Sign Language (LSF)7. More projects
have been initiated in Ireland, China (Hong Kong), Italy and Sweden, and new projects
are expected to begin in other countries or regions. The thesis of Konrad (2009) includes
a complete survey of and substantial information on all the ongoing and finished SL
corpus projects8.
In order to be usable and exploitable for linguistic purposes, an SL corpus must be made
up of the video data, the metadata and the annotation files of each video clip. The next
subsections will introduce the specificities and the issues related to each of these
components of SL corpora.
2.1. Data
SLs illustrate the visual-gestural modality of the human capacity for language. They
involve the use of several articulators that work simultaneously and/or sequentially: the
hands, the upper body, the head, the face, the eyes, the eyebrows, the mouth, and the
cheeks. Video cameras are thus necessary to record SL samples, and even high
definition cameras in order to capture the detailed movements of the small articulators
like the eyes, the eyebrows, the mouth and the cheeks. These specificities have
implications for the corpus construction in a certain number of respects.
2.1.1 Recording conditions
As the signer’s face fully participates in the linguistic expression, the participants cannot
be recorded anonymously. Moreover, the requirements of the video recording result in
a minimal technical setting (the cameras at least, and also possible tripods, lights and
backdrops) that inevitably diminishes the signer’s spontaneity, leastways at the starting
of the recording session. The researchers must choose the best balance between the
spontaneity of the productions and the quality of the recordings, regarding their
objectives. But in any case, SL corpora do not include anonymous data captured in
spontaneous situations, which could be equivalent to anonymous spoken language data
recorded with a small sound recorder. It is therefore necessary to carefully set up the
4 http://www.auslan.org.au/about/corpus/. 5 http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/corpus.html. 6 http://www.bslcorpusproject.org/.
7 http://www.umr7023.cnrs.fr/-Realisationde-corpus-de-donnes-.html. 8 http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/sl-corpora.html.
LAURENCE MEURANT & AURÉLIE SINTE
202
recording conditions (including the place, the tasks and the participants) in order to
compensate these limitations as much as possible.
The announcement of the project to the Deaf community is a crucial stage in establishing
the trust and confidence of the potential participants. In addition to announcements on
Deaf websites and flyers, face-to-face information in Deaf clubs and other meeting
places remains indispensable. The main achieved and ongoing SL corpora that include
regional variations have been collected in different schools, deaf clubs or other familiar
places for the participants. This mobile way of working facilitates the participation of
informants living in large countries and certainly contributes to the relative naturalness
of the recording setting. In addition, researchers can try to reduce as much as possible
the recording equipment, for example in using small cameras and reduced or no lights.
But again, the researchers have to find the right compromise within the conflicting
constraints of naturalness and relevant quality of the data obtained; the team of the
Corpus NGT project, for example, has chosen not to use light systems, but they hooked
cameras up to the ceiling in order to collect views from above of each participant
(Crasborn & Zwitserlood 2008: 44-45).
2.1.2. Tasks and communication setting
The spontaneity and the quality of the data are also strongly determined by the type of
tasks used to elicit the signers’ productions. The translation tasks, which tended to be
used in the early years of SL research, were certainly easy to implement, but are now
considered to have induced interferences between the source and the target language,
and therefore constitute invalid data on the language under scrutiny. Story retelling
tasks, on the contrary, while at the same time being easy to set up, generate much more
natural productions, and avoid language interference when they are based on visual
materials such as picture or film stories. Story retelling tasks have been commonly used
in data collection. As a consequence, a substantial part of SL descriptions are based on
narrative productions, which only represents a restricted part of the signers’ productions.
If the data collection has commonly been carried out by the researcher him/herself – in
most cases a hearing researcher – using various introspection techniques (error
recognition and correction, grammaticality judgments, semantic judgments or other
judgment tasks) and elicitation tasks (free and guided composition, story retelling, video
or picture descriptions)9, it is now recognized that it is very important not to have any
hearing researcher present during the recording sessions. More generally, it has been
shown (Lucas et al. 1992) that signers are highly sensitive to the linguistic status of their
interviewer, whether he/she be hearing or even using a different SL variant (due to
geographical, age, gender or ethnicity differences). In order to control the language
contact effect, an interesting solution adopted in most of the current main corpus projects
is to compose dyads (or triads) of deaf participants and to engage them in dialogue. The
dyads (or triads) may be composed of two naïve participants, that is, individuals not
9 See Van Herreweghe & Vermeerbergen (2013) for a comprehensive overview on data elicitation
techniques and materials frequently or less frequently used in SL corpus collection.
TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES
203
related to the research team (as in the Corpus NGT project), or of one or more naïve
participants(s) guided by a deaf researcher or assistant in a semi-directed interview (as
in parts of the CREAGEST project). This conversational condition has the advantage of
allowing various types of tasks and types of discourse: the partners can be led to produce
narratives, but also descriptive, argumentative, explicative discourses in a quite informal
and natural recording situation.
2.1.3. Participant selection
Selecting the participants, of course, plays an important role in the relevance of a corpus.
But in the context of an SL corpus, this issue takes on a particular dimension. The
concept of native-like signing has to be handled with even more precautions than in the
context of spoken languages (Davies 2003), given the peculiar transmission pattern of
SLs. Only a small minority of signers (5 to 10 percent) has grown up in a signing family.
And an even smaller proportion of them have parents who are native signers themselves
(Vermeerbergen & Van Herreweghe 2013). It follows that native signers with native
parents represent isolated cases in the landscape of SL users. Therefore, an SL corpus
that included only (second-generation) deaf-of-deaf signers would be representative of
only a very small proportion of the users of any SL. In the case of small SL communities,
such a corpus would be constrained to include a very limited number of informants
(Costello et al. 2008: 78). And since in all SL communities there are many more non-
native signers than native ones, a corpus of native signers could not be expected to be
free of any production influenced by the SL use of non-native signers, who are,
statistically, the most frequent interlocutors of native signers. A common option taken
by the researchers is to select native and near-native signers. A near-native signer may
be defined as a signer who has acquired SL at an early age (3-7, depending on the case),
who has been educated in a school for the deaf, who uses the studied SL daily, or who
is a long-time member of the Deaf community (Van Herreweghe & Vermeerbergen
2013).
2.2. Metadata
The specific issues presented under 2.1.3. regarding the selection of informants clearly
shows the relevance of collecting metadata on the filmed informants and the recording
sessions. In most SL corpora, metadata are encoded with the IMDI editor developed at
the Max Planck Institute for Psycholinguistics in Nijmegen10.
The metadata related to the informants may include information about their region, sex
and age at the time of recording, their hearing status, parents’ and siblings’ hearing
status, their position within the brothers and sisters (if any), and the type of hearing aid
they use (if any), their age of exposure to SL and the place and context of this exposure,
the type of school they attended, the primary language of communication within the
family, etc. The metadata related to the recording sessions may include the type of
discourse produced (narrative, descriptive, explicative, argumentative, etc.), the
10 The IMDI Metadata editor, browser and organizer tools are now gathered in a unique tool called
Arbil, which is also freely accessible on the MPI page: http://www.lat-mpi.eu/tools/arbil/.
LAURENCE MEURANT & AURÉLIE SINTE
204
communication setting (monologue, dialogue, with naïve participants or with a member
of the research team, etc.), the tasks and the materials used for elicitation, the degree of
formality, the place, etc.
While this information is crucial for purposes of research, it should not be published, in
order to protect the privacy of the participants. For the Corpus NGT project, public
access to the majority of the metadata is limited, and no mention or reference is made
to the names or the initials of the signers. Only researchers who sign a license form
declaring not to publish information on individuals are granted access to personal
information about the family background and signing experience of the participants. But
these precautions alone do not resolve the issue of privacy related to the collection of
SL data, since the video clips cannot be anonymous and the signers’ productions may
reveal a lot of personal information or statements (for a more in depth reflexion on this
topic, see Crasborn 2008).
2.3. Annotation
Since they are unwritten languages, SL corpora share several features with spoken
language corpora. SLs involve face-to-face communication, and their discourses share
the (relative) spontaneity of oral communication in spoken languages and the
impossibility of post-hoc editing. From a more technical point of view, SL corpora, like
spoken language corpora, need to be annotated in order to be machine-readable. The
fact that SLs use multiple and simultaneous articulators explains that SL research has
fostered the integration of digital video into corpus annotation software. Several tools
are available for the annotation and analysis of sign language data, notably
SyncWRITER (Hanke 2001), SignStreamTM (Neidle et al. 2001), ELAN11 and iLex
(Hanke & Storz 2008). But annotating SL data is not identical to annotating spoken
language data. Again, the specificities are related to the visual-gestural modality of SLs
and the subsequent impossibility to write them using an existing graphic system that can
be machine-readable12.
2.3.1. Glosses
Annotating each of the single elements of an SL production implies to gloss them with
words from a written language; that is to say, glossing them with words from another
language. But, as is the case in all language pairs, the lexicon of the SL does not
correspond exactly with the lexicon of the written language used in the glosses. One
single sign may correspond to various English or French words, depending on the
context of its use, and the opposite can be true, namely different signs of an SL may
correspond to one single word in English or in French. For example, there are various
11 ELAN is the tool that will be used for the annotation of the LSFB corpus project.
http://tla.mpi.nl/tools/tla-tools/elan/elan-description/. 12 The “Sign Writing” system developed by V. Sutton in 1974 (http://www.signwriting.org/), which is
unevenly distributed within the Deaf community, is not a machine-readable system that could be used
to transcribe SL data.
TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES
205
ways to express the meaning ‘before’ in LSFB (see Figure 113). The three variants
presented in Figure 1 are related to the use of different kinds of representation of time
in space (or “time-lines”, Sinte 2010), involving different types of (deictic or anaphoric)
reference points (Meurant 2008). But neither English nor French encodes the same
distinction. Therefore, using the gloss BEFORE14 (or AVANT in French) to all three of
these LSFB signs would amount to ignoring the difference between them and render
any further automatic research on one of them impossible. This would make the (time-
consuming) glossing work completely useless.
Figure 1. Three different signs in LSFB for the English meaning ‘before’
2.3.2. ID-glosses and lemmatisation
In order for an SL corpus to be a machine-readable and searchable corpus rather than a
simple archive, a unique gloss must be attributed to each identified sign type, called
“ID-gloss” (Johnston 2010). The ID-gloss is then the written word (in English, French,
etc.) that designates the same sign through all its occurrences (tokens) within the corpus,
regardless of the meaning of this sign in each particular context. For example, the ID-
gloss DRAINED (ÉPUISÉ in French glossing) within an LSFB corpus may correspond in
one context to the meaning ‘exhausted’ (‘fatigué’ in French) but in another context to
the meaning ‘unavailable’ (‘indisponible’ in French). The consistent use of a single ID-
gloss signifies that this meaning variation is not lexically distinctive in LSFB. Matching
a sign articulation with its ID-gloss or lemma often implies to decide on the status of
non-manual components and variations of a sign. For example, the third sign of figure
1 should be associated with the same ID-gloss (say BEFORE_3) than another
occurrence that would not show puffed cheeks and would then not involve the same
meaning of intensity (‘before’ vs. ‘long before’). Both occurrences should be identified
as a unique lexical sign undergoing non-manual modification.
This ID-glossing or lemmatisation process is essential to ensure the consistency of data
glossed by multiple researchers or even by the same researcher in different moments. It
is only in this way that it will be possible to search through a large number of annotation
files to know how a given sign behaves in the variety of available contexts.
Lemmatisation should ideally precede the annotation process, so that when glossing the
corpus files, one might refer to the pre-existing identification of signs. However, in the
case of SL data, annotation and lemmatisation need to be conducted in parallel, since
the lexicographic analysis of SLs – because of the lack of writing system, of written
13 Figure 1 presents three different forms. But an in-progress PhD study on time in LSFB reveals that
even more possibilities do exist (Sinte forthcoming). 14 By convention in SL linguistics, the sign glosses are written with small caps words.
Sign 1 meaning ‘before’ Sign 2 meaning ‘before’ Sign 3 meaning ‘before’
LAURENCE MEURANT & AURÉLIE SINTE
206
tradition and therefore of community-wide standards – is still to be done and is
dependent on the corpus analysis work. In other words, the lemmatization process
presupposes the existence of a representative corpus of the language, which needs to be
lemmatized in order to be searchable. The recent developments of the links between SL
annotation tools and lexicon databases (e.g. the link between ELAN and Lexus15, or the
integration of both tools within iLex – meaning “integrated lexicon”) can certainly be
seen as facilitating the parallel and consistent development of the lemmatization and the
annotation of a studied SL16.
3. The LSFB corpus project
Linguistic research on LSFB has emerged very recently as an academic discipline. The
first doctoral research on this language was presented in 2006 (Meurant 2008). But the
tardiness with which LSFB research emerged could be compensated for somewhat by
the fact it has emerged onto the scientific scene at a fruitful moment, namely when the
recent advances in technology, tools and methodologies have facilitated the gathering
of extensive bodies of digital film footage, and have thus opened the way for discourse
analyses. The existing studies of LSFB have so far been mostly devoted to micro-textual
aspects of the language structure (at the level of morphology and syntax)17. Corpus-
based linguistics applied to LSFB is predicted to provide an interesting and new
interaction between theory and practice, giving rise to useful tools for bilingual
education18 and LSFB-French interpretation in the French Community of Belgium19.
This section will present the main specificities of the LSFB corpus project and the first
research axes it will support.
3.1. Discourse and variation corpus
The aim of the LSFB corpus project20 is to gather a representative sample of this
language’s current uses, including the variety of its uses. The variety will concern
discourse genres, signers and interlocutors, and registers.
15 http://www.lat-mpi.eu/tools/lexus/. 16 The 5th workshop on the Representation and processing of Sign Languages, held during the LREC
2012, 8th ELRA Conference on Language Resources and Evaluation (Istanbul, May 2012), has been
devoted to “Interaction between Corpus and Lexicon” (http://www.sign-lang.uni-hamburg.de/lrec2012/cfp.htlm). 17 In 2010 and 2011, we collected 20 hours of LSFB data made of semi-directed and spontaneous
dialogs between deaf signers. These data have already made more macro-textual analyses possible. 18 Since 2000, the association “École et Surdité” (School and Deafness) has implemented in Namur a
bilingual (LSFB and French) teaching structure. Deaf pupils are grouped within hearing classes and are
given the classes in LSFB, including the lessons on French reading and writing. LSFB is the language
used for all face-to-face communication and written French is the language used for all written
situations. École et Surdité and the University of Namur have been collaborating since the beginning of
this project (Meurant 2012). 19 Currently, there is no LSFB-French interpretation training in Belgium. This has many negative
consequences for the Deaf community, including for the education of Deaf children and young people. 20 A bilingual (LSFB with French subtitles) film has been created to inform the Deaf community about
the project, its aims and its issues. It is available at http://www.corpus-lsfb.be.
TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES
207
3.1.1. Genres
As mentioned above, narrative productions have occupied an important place in SL
research so far. We are however constantly reminded, mainly by our experience of
working with the teachers of the bilingual classes of “Ecole et Surdité”, that the
linguistic structures present within narratives do not have the same frequency, and are
not achieved in exactly the same way as within other discourse genres. In order to know
more about these differences, we wanted to provide the LSFB corpus with a variety of
genres, including – besides a number of narratives – descriptive, explicative and
argumentative discourse productions.
3.1.2. Signers and interlocutors
The aim of the project is to collect the productions of approximately 70 signers, for a
total of approximately 280 hours of recordings. It has been reported above that native
signers represent a very small proportion of any SL community. It is of course also true
for the LSFB community. This exceptional nature probably contributes to the
idealization of the supposed native-like “visual signing”, especially when reference is
essentially made to story telling productions. If SL grammar involves linguistic use of
space and multiple articulators, the real (quantitative and qualitative) use of modality-
specific devices (such as highly iconic structures) by native-signers vs. non-native
signers remains comparatively unexplored. A first contrastive look at signing data from
BSL native vs. non-native pairs has brought to light certain nuances regarding the
frequency and the extent of spatial and iconic devices’ used by native signers (Rentelis
2009). This is why we plan to include native, near-native (see 2.1.3.) and late signers
within the corpus. We also would like to include CODAs (hearing Children Of Deaf
Adults), as a particular group of hearing native signers. This variety of signers’ profiles
will make it possible to investigate the impact of the conditions of LSFB acquisition
(age, school and social context) as well as the effect of the contact between LSFB and
French in the signing productions. Participants will be selected in different regions of
the French Community of Belgium for their relevance regarding LSFB variants
(Brussels – Uccle, Berchem, Woluwé, Mons, Namur, Liège). Different ages (18 and up)
as well as both sexes will be represented.
In accordance to the objectives mentioned above, the metadata of each video clip will
necessary include information on sign competence (e.g. age of acquisition, type of use,
regional attachment), bilingual status (e.g. writing and reading competences, type of use
of each language, hearing status) and education (e.g. deaf/oral/hearing school). Thanks
to this information, it should then be possible to investigate the variations and the
language contact effects that may distinguish the different groups of signers through the
variety of discourse genres and the variety of interactions (monologues, dialogues,
group interaction, public presentations, etc.). As far as possible, for the elicited dialog
tasks within the SL lab, the dyads will be composed homogeneously and
heterogeneously regarding age and/or age of acquisition of LSFB: for example, one
native signer will first interact with another native-signer, but also with a late-signer.
LAURENCE MEURANT & AURÉLIE SINTE
208
3.1.3. Registers
The LSFB corpus project will also include register variations, from the most formal to
the most informal, including spontaneous as well as prepared speech productions. We
expect to foster register variation by varying the recording places and contexts
(conference rooms, the SL lab at the university, schools, social places, etc.), as well as
varying the interaction settings (monologues, dialogues, group interactions, training
interaction, public presentations, etc.). For example, the recordings made out of the SL
lab will include prepared testimonies presented in public, spontaneous comments
immediately following a conference, spontaneous discussions at the breaks of a training
session and trainer-group interactions during SL teachers’ trainings. The elicitation
material for the SL lab sessions will include prepared and non-prepared tasks as story-
telling, dialogs about informants’ lives, tastes and experiences, role plays on hot topics
related to the Deaf community, conversations about daily life as well as more reflexive
exercises (e.g. play rules or mathematical principles to explain).
3.2. Research avenues
The LSFB corpus project aims to provide the scientific community – as well as the
teachers and the whole Deaf community21 – with a referential corpus designed to be a
resource for a wide variety of future research projects. The research axes currently under
investigation or scheduled for the near future are all connected with the concerns and
needs stemming from the everyday practices of the LSFB-French interpreters and the
bilingual teachers we are in regular contact with. Four of them are briefly introduced
below as illustrations.
3.2.1. Discursive structures
The conventions governing the organisation of discourse vary from language to
language and from culture to culture. The visual-gestural nature of SLs seems to
contribute to the specificity of their structure (Janzen 2005). The LSFB corpus will allow
us to extend the knowledge in the field by including types of discourse that have
heretofore rarely been studied (descriptive, explicative, argumentative, as well as the
narrative type most frequently used in literature). This will involve identifying
discursive tags and markers and describing their uses. We expect to find, for example,
that breaking or changing the gaze, as well as nods and head, act as discourse markers.
One underexplored phenomenon that we expect to find plays a role in the marking of
discourse structure is a sort of bracketing structure based on the repetition of a sign that
frames a (short or long) sequence of signs. The produced structure can be figured as an
“A – (b c d) – A” structure, where A is the repeated and bracketing sign and (b c d)
represents the bracketed sign sequence. An example is given in figure 2: in this case, the
bracketing structure isolates the term RETOUCH from the metalinguistic (explicative)
comment on this term.
21 The data (videos and annotations) will be stored by the University of Namur. They will be published
at the end of the project as open access data with specific copyright rules. The practical details of this
publication and accessibility are still under study.
TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES
209
It is important to underline that no specific task will be built in order to elicit this or that
specific structure. The instructions given to the informants will only be aimed at
fostering one specific discourse genre (e.g. can you tell me…, can you explain him…,
can you describe…, can you play the role of…), but the expression itself will not be
controlled by ad hoc tasks.
Figure 2. Bracketing structure
3.2.2. Voices and perspectives
At the level of syntax and at least within narrative discourses, LSFB makes a frequent
use of constructions alternating between external and internal perspectives on
statements. These structures are made up of the combination of so-called “constructed
action” (Metzger 1995) or “role shift” (Padden 1986) structures, which express an
internal perspective on the action, and “classifier” constructions (Schembri 2003),
which express an external perspective. Figure 3 gives an example of this kind of
alternation, within a “scale alternation” structure (Meurant 2008). The LSFB corpus will
make possible to investigate whether these voice and perspective alternations, very
frequent in story telling, occur with the same frequency and in the same forms within
explicative, descriptive and argumentative discourses.
Figure 3. Alternation of perspectives within a “scale alternation” structure
3.2.3. Paraphrasing
As a result of having been excluded from the educational and intellectual spheres for a
century (1880-1980) and considering the difficulty deaf people have, even today, in
gaining access to higher education, SLs have been stigmatized for their lack of
specialized terminology. It is essential not to associate the problems rooted in socio-
linguistic shortcomings with any lack within the LSs’ linguistic systems. But it is also
vital (especially for teachers and interpreters) to expose those devices which are specific
to SLs and to their visual-gestural character of them that compensate for or explain the
comparative lack of specialized terminology. Investigating the LSFB corpus, we will
seek to inventory the devices (fingerspelling, iconic rephrasing, mouthings22, etc.) used
22 The term ‘mouthings’ refers to the mouth movements that accompany the hands movements in SL
productions (van de Sande & Crasborn 2009).
RETOUCH SO-CALLED SAME CLEAN-FACE CLEAN R. (E.T.O.U.C.) H. RETOUCH
(…)
‘To retouch (photographs), what is written R.E.T.O.U.C.H., namely clean the face’
‘He comes up playing music ’
PLAY-MUSIC a-MOVE PLAY-MUSIC
LAURENCE MEURANT & AURÉLIE SINTE
210
to refer to objects, events and concepts which do not have an established lexical sign in
LSFB, to describe the ways in which these devices are integrated into the discourse, and
to study the variation of their use from one type of discourse to another.
3.2.4. Fluency and disfluency markers
Little attention has been given to fluency per se among SLs (Lupton 1998; Nicodemus
2011). Yet the criterion of fluency vs. non-fluency is frequently referred to when it
comes to defining the profile of participants for SL studies or corpora (Van Herreweghe
& Vermeerbergen 2013), or the linguistic proficiency of an interpreter or of the relatives
and teachers who are the linguistic models for deaf children. The LSFB corpus will be
called upon in order to investigate the features (at the level of prosody, lexicon and
syntax) and combinations of features that contribute to fluency and disfluency in LSFB.
These non-exclusive research axes all profile forthcoming studies on LSFB as
discourse-oriented and sensitive to language variation.
4. Conclusion
Collecting a large-scale corpus of LSFB is indispensable to the pursuit of research on
this language. Knowledge of LSFB, as well as the quality and the validity of this
knowledge, its relevance to the education of the Deaf, and LSFB-French interpretation
training directly depend on the success of this project. The LSFB corpus project will
greatly benefit from the previous experience of the other SL research teams with respect
to the data collection, the management of metadata and the annotation process.
Acknowledgements
The LSFB corpus project is supported by an Incentive Grant for Scientific Research
(2012-2014) from the F.R.S.-FNRS (Foundation for Scientific Research in Brussels and
Wallonia, Belgium) and by the University of Namur, Belgium.
References
Bergman, Brita & Johanna Mesch. 2004. ECHO data set for Swedish Sign Language (SSL) Stockholm: Department of Linguistics, University of Stockholm. http://www.let.ru.nl/sign-lang/echo
Costello, Brendan, Javier Fernandez & Alazne Landa. 2008. The non-(existent) native signer: sign language research in a small deaf population. In Ronice Muller de Quadros (ed.), Sign Languages: Spinning and Unraveling the Past, Present and Future. TISLR 9: forty five papers and three posters from the 9th Theoretical Issues in Sign Language Research Conference, Florianopolis, Brazil, December 2006. Florianopolis, Brazil: Editora Arara Azul, 77-94. http://www.editora-arara-azul.com.br/EstudosSurdos.php
Crasborn, Onno. 2008. Open access to sign language corpora. In Onno Crasborn, Thomas Hanke, Eleni Efthimiou, Inge Zwitserlood & Ernst Thoutenhoofd (eds), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages. Paris: ELDA, 33-38. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W25_Proceedings.pdf
TOWARDS A CORPUS OF FRENCH BELGIAN SIGN LANGUAGE (LSFB) DISCOURSES
211
Crasborn, Onno, Els van der Kooij, Annika Nonhebel & Wim Emmerik. 2004. ECHO data set for Sign Language of the Netherlands (NGT). Nijmegen: Department of Linguistics, Radboud University Nijmegen. http://www.let.ru.nl/sign-lang/echo
Crasborn, Onno & Inge Zwitserlood. 2008. The Corpus NGT: an online corpus for professionals and laymen. In Onno Crasborn, Thomas Hanke, Eleni Efthimiou, Inge Zwitserlood & Ernst Thoutenhoofd (eds), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages. Paris: ELDA, 44-49. http://www.lrec-conf.org/proceedings/lrec2008/workshops/W25_Proceedings.pdf
Davies, Alan. 2003. The Native Speaker: Myth and Reality. Clevedon: Multilingual Matters.
Hanke, Thomas. 2001. Sign language transcription with syncWRITER. Sign Language and Linguistics 4, 267-275.
Hanke, Thomas & Jakob Storz. 2008. iLex — A database tool integrating sign language corpus linguistics and sign language lexicography. Paper presented to the 3rd Workshop on the Representation and Processing of Sign Languages at International Conference on Language Resources and Evaluation, Marrakech, Morocco. http://www.sign-lang.uni-hamburg.de/ilex/lrec2008_hanke.pdf
Janzen, Terry. 2005. Interpretation and language use: ASL and English. In Terry Janzen (ed.), Topics in Signed Language Interpreting. Amsterdam: John Benjamins, 69-105.
Johnston, Trevor. 2010. From archive to corpus: transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics 15(1), 106-131.
Johnston, Trevor & Adam Schembri. 2006. Issues in the Creation of a Digital Archive of a Signed language. In Linda Barwick & Nicholas Thieberger (eds), Sustainable Data from Digital Fieldwork. Sydney: University of Sydney, 7-16.
Konrad, Reiner. 2009. Die lexikalische Struktur der DGS im Spiegel empirischer Fachgebärdenlexikographie. Zur Integration der Ikonizität in ein korpusbasiertes Lexikonmodell. Unpublished dissertation, Universität Hamburg.
Lucas, Ceil & Clayton Valli. 1992. Language Contact in the American Deaf Community. New York, NY: Academic Press, Inc.
Lucas, Ceil, Robert Bayley & Clayton Valli. 2001. Sociolinguistic variation in American Sign Language. Washington, DC: Gallaudet University Press.
Lupton, Linda. 1998. Fluency in American sign language. Journal of deaf studies and deaf education 3(4), 320-328.
Metzger, Melanie. 1995. Constructed dialogue and constructed action in American Sign Language. In Ceil Lucas (ed.), Sociolinguistics in Deaf Communities. Washington DC: Gallaudet University Press, 255-271.
Meurant, Laurence. 2008. Le regard en langue des signes. Anaphore en langue des signes de Belgique francophone (LSFB). Morphologie, syntaxe, énonciation. Rennes, Namur: Presses Universitaires de Rennes, Presses Universitaires de Namur.
Meurant, Laurence. 2012. In search of the ideal partnership between sign linguistics research and a bilingual teaching project. The case of Namur, Belgium. In Lorraine Leeson & Myriam Vermeerbergen (eds), Working with the Deaf Community. Education, Mental Health and Interpreting. Dublin: Interesource Group (Ireland) Limited.
Neidle, Carol, Stan Sclaroff & Vassilis Athitsos. 2001. SignStream™: a tool for linguistic and computer vision research on visual-gestural language data. Behavior Research Methods, Instruments, and Computers 33, 311-320.
Nicodemus, Brenda. 2011. Disfluencies in American Sign Language and English. Paper presented at the 33rd Annual Conference of the German Linguistic Society (DGfS). Theme session 6: Sign language discourse, Georg August University, Göttingen.
LAURENCE MEURANT & AURÉLIE SINTE
212
Padden, Carol. 1986. Verbs and role-shifting in American Sign Language. In Carol Padden (ed.), Proceedings of the Fourth National Symposium on Sign Language Research and Teaching. Silver Spring, MD: National Association of the Deaf, 44-57.
Rentelis, Ramas. 2009. Processing of British Sign language in native and non-native Deaf signers. Paper presented at the Colloque International sur les Langues des Signes (CILS), Namur: University of Namur.
Schembri, Adam. 2003. Rethinking “classifiers” in signed languages. In Karen Emmorey (ed.), Perspectives on classifier constructions in sign languages. Mahwah, New Jersey: Lawrence Erlbaum Associates, 3-34.
Sinte, Aurélie. 2010. Français - Langue des signes française de Belgique (LSFB): quelques éléments d'analyse contrastive des temps verbaux. Cahiers de l'AFLS 16.
Sinte, Aurélie. forthcoming. Expression(s) de temps en LSFB. Références, ancrages énonciatifs, structures discursives. PhD Thesis, University of Namur.
Stokoe, William C. 1960. Sign Language stucture: an outline of the visual communication systems of the American deaf. Studies in Linguistics: Occasional papers 8.
Trevoort, Bernard. 1953. Structurele analyse van visueel taalgebruik binnen een groep dove kinderen. Amsterdam: Noord-Hollandsche Uitgevers Maatschappij.
van de Sande, Inge & Onno Crasborn. 2009. Lexically bound mouth actions in Sign Language of the Netherlands: a comparison between different registers and age groups. Linguistics in the Netherlands 26, 78-90.
Van Herreweghe, Mieke & Vermeerbergen, Myriam (2012). Data Collection. In Roland Pfau, Markus Steinbach & Bencie Woll (eds) Sign Language. An International Handbook. Handbooks of Linguistics and Communication Science (HSK) series, n°37. Berlin: Mouton de Gruyter, 1023-1045.
Woll, Bencie, Rachel Sutton-Spence & Daffyd Waters. 2004. ECHO data set for British Sign Language (BSL). London: Department of Language and Communication Science, City University. http://www.let.ru.nl/sign-lang/echo