Conference Presentation: Making Sign Language Resources Findable and Comparable: The Sign Language Dataset Compendium

Name: Conference Presentation: Making Sign Language Resources Findable and Comparable: The Sign Language Dataset Compendium
Start: 2022-10-05T12:30:00+02:00
End: 2022-10-05T13:00:00+02:00
Location: Berlin, Germany

Maria Kopf, Marc Schulder, Thomas Hanke

Date

5 October 2022 12:30 — 13:00

Event

Language Documentation Archiving Conference

Location

Berlin, Germany

Presentation

International Sign Interpreter: Razaq Fakir

Abstract

Recent decades have seen a marked increase in digital research resources for sign languages (SLs). Even so, compared to the number and diversity of SLs, data is still very scarce. Finding and comparing resources is a challenging task, as information on datasets is distributed across different publications, data repositories and (potentially defunct) project websites, where amount and kind of the given information can vary widely. Therefore we introduce the Sign Language Dataset Compendium which helps make datasets findable, comparable and accessible.

The compendium provides an extensive overview of linguistic resources for SLs. It covers both corpora of (semi-)spontaneous language production of L1 signers and lexical re- sources such as dictionaries and sign banks. Additionally, an index of commonly used corpus data collection tasks helps in finding comparable content across corpora. En- tries provide structured information and metadata to make them comparable as well as key references to literature, documentation and where to obtain the data. Curation criteria for the compendium are designed to favour inclusion of resources from less-resourced languages while providing stricter requirements for better resourced languages. It is our hope that the compendium will further lead to more interest in less-resourced SLs.

The compendium can be found at www.sign-lang.uni-hamburg.de/lr/compendium/. At the time of writing it covers 41 corpora, 71 lexical resources and 27 data collection tasks, covering 76 different SLs. The compendium is intended as a growing resource that will be updated regularly with new entries and features. A comparative overview of annotation standards is also in preparation.

Marc Schulder

Research Associate in Computational Linguistics

My research interests include sign languages, natural language processing, and open science.