OpenPose in the Public DGS Corpus (v2)

Marc Schulder, Thomas Hanke

May, 2020

Type

Report

Publication

Project Note AP06-2019-01

Versions

Latest version:
Version 2:
Version 1:

Abstract

This project note gives an overview of how pose information was created for the Public DGS Corpus with the use of OpenPose. Pose information is machine-readable data that describes where people are located in an image, providing the coordinates for various points of each body, such as joints, eyes or ears. The data we generate consists of body, face and hand models for informants in every camera perspective of all published transcripts.

Several postprocessing steps were applied to the data before publication. These include a) the correction of errors, such as the false positive recognition of bodies, and recognition of one actual person as two distinct bodies, b) ensuring the consistent order of people across frames in multi-person perspectives, c) the removal of pose information for the moderator, and d) the anonymisation of utterances containing sensitive information.

The resulting data is stored in one JSON file per transcript. Each file contains the pose information of the three published camera perspectives. The data format is designed to collect the default single-frame outputs of OpenPose in a single file and to provide additional relevant metadata for each camera perspective.

Marc Schulder

Research Associate in Computational Linguistics

My research interests include sign languages, natural language processing, and open science.