Konferenzvortrag: Making Sign Language Resources Findable and Comparable: The Sign Language Dataset Compendium

Name: Konferenzvortrag: Making Sign Language Resources Findable and Comparable: The Sign Language Dataset Compendium
Start: 2022-10-05T12:30:00+02:00
End: 2022-10-05T13:00:00+02:00
Location: Berlin, Deutschland

Maria Kopf, Marc Schulder, Thomas Hanke

Datum

5. Oktober 2022 12:30 — 13:00

Veranstaltung

Language Documentation Archiving Conference

Ort

Berlin, Deutschland

Dieser Vortrag ist nur auf Englisch verfügbar.

Präsentation

International Sign Dolmetscher: Razaq Fakir

Zusammenfassung

Recent decades have seen a marked increase in digital research resources for sign languages (SLs). Even so, compared to the number and diversity of SLs, data is still very scarce. Finding and comparing resources is a challenging task, as information on datasets is distributed across different publications, data repositories and (potentially defunct) project websites, where amount and kind of the given information can vary widely. Therefore we introduce the Sign Language Dataset Compendium which helps make datasets findable, comparable and accessible.

The compendium provides an extensive overview of linguistic resources for SLs. It covers both corpora of (semi-)spontaneous language production of L1 signers and lexical re- sources such as dictionaries and sign banks. Additionally, an index of commonly used corpus data collection tasks helps in finding comparable content across corpora. En- tries provide structured information and metadata to make them comparable as well as key references to literature, documentation and where to obtain the data. Curation criteria for the compendium are designed to favour inclusion of resources from less-resourced languages while providing stricter requirements for better resourced languages. It is our hope that the compendium will further lead to more interest in less-resourced SLs.

The compendium can be found at www.sign-lang.uni-hamburg.de/lr/compendium/. At the time of writing it covers 41 corpora, 71 lexical resources and 27 data collection tasks, covering 76 different SLs. The compendium is intended as a growing resource that will be updated regularly with new entries and features. A comparative overview of annotation standards is also in preparation.

Marc Schulder

Wissenschaftlicher Mitarbeiter für Computerlinguistik

Meine Forschungsinteressen umfassen Gebärdensprachen, Computerlinguistik und Open Science.