OpenPose in the Public DGS Corpus (v2)

Marc Schulder, Thomas Hanke

Mai, 2020

Typ

Bericht

Publikation

Arbeitspapier AP06-2019-01

Diese Publikation ist nur auf Englisch verfügbar.

Versionen

Aktuellste Version:
Version 2:
Version 1:

Zusammenfassung

This project note gives an overview of how pose information was created for the Public DGS Corpus with the use of OpenPose. Pose information is machine-readable data that describes where people are located in an image, providing the coordinates for various points of each body, such as joints, eyes or ears. The data we generate consists of body, face and hand models for informants in every camera perspective of all published transcripts.

Several postprocessing steps were applied to the data before publication. These include a) the correction of errors, such as the false positive recognition of bodies, and recognition of one actual person as two distinct bodies, b) ensuring the consistent order of people across frames in multi-person perspectives, c) the removal of pose information for the moderator, and d) the anonymisation of utterances containing sensitive information.

The resulting data is stored in one JSON file per transcript. Each file contains the pose information of the three published camera perspectives. The data format is designed to collect the default single-frame outputs of OpenPose in a single file and to provide additional relevant metadata for each camera perspective.

Marc Schulder

Wissenschaftlicher Mitarbeiter für Computerlinguistik

Meine Forschungsinteressen umfassen Gebärdensprachen, Computerlinguistik und Open Science.