The Sign Language Interchange Format: Harmonising Sign Language Datasets for Computational Processing

Publication
In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops: Sign Language Translation and Avatar Technology (SLTAT 2023)
This article is ©2023 IEEE. I recommend using the accepted version of the paper, which is open access. For the closed access published version follow the URL in the citation below, but note that most hyperlinks have been broken in that version. The accepted version does not have this problem.

Abstract

We introduce the Sign Language Interchange Format, a new format for representing annotations and lexical inventories of sign language datasets. The format is designed as an intermediate step in data preparation for language technologies, unifying the annotation conventions of different corpora for further use. Complex gloss notations and implicit relations between tiers are made explicit through a hierarchy of machine-readable container structures. Sample implementations for converting to and from the new format are provided.

Cite as

M. Schulder, S. Bigeard, T. Hanke and M. Kopf, “The Sign Language Interchange Format: Harmonising Sign Language Datasets For Computational Processing”, 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops: Sign Language Translation and Avatar Technology, Rhodes Island, Greece, 2023, doi: 10.1109/ICASSPW59220.2023.10193022. URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10193022&isnumber=10192577

Marc Schulder
Marc Schulder
Research Associate in Computational Linguistics

My research interests include sign languages, natural language processing, and open science.