Editing and Integration of multimodal ressources in CLARIN-D (WG 6)

Project content

The curation project for the integration of multimodal ressources into CLARIN-D was accepted in April 2012. Responsible for this project is F-AG 6 “Sprache und andere Modalitäten” (Language and other modalities), lead by Prof. Dr.-Ing. Stefan Kopp (Universität Bielefeld). It will be implemented by Farina Freigang, M. Sc. (Universität Bielefeld) in cooperation with student assistants at other participating institutions.

The project can build on technical advice and assistance from two CLARIN-D centers: the Bayerisches Archiv für Sprachsignale (BAS; PD Dr. Christoph Draxler and PD Dr. Florian Schiel) as well as the Max-Planck Institut für Psycholinguistik (MPI; Han Sloetjes and Sebastian Drude).

WG 6 “Sprache und andere Modalitäten” (Language and other modalities) pursues the goal of strengthening and establishing aspects of multimodal language resources in CLARIN-D. As a first step multimodal resources are to be integrated into CLARIN-D infrastructure in this curation project. A review in the F-AG came to the conclusion that multimodal resources that are used and gathered today are highly heterogeneous. Because of this the F-AG decided the most meaningful way to step forward would be to integrate parts of several existing corpora into CLARIN-D rather than one large corpus. The emphasis of the curation project is not on integrating large amounts of data but rather to build a foundation which will facilitate future data integration. Furthermore this exemplary data preparation will establish standards for metadata, methods for quality assurance and annotation methods and guidelines. This way a cross-section of ressources used in the multimodal research community is achieved with regard to multimodal primary and seconday data as well as metadata.

To get good coverage of the data used in the research community, the WG has chosen three representative corpora. For one, these are two large corpora mostly annotated manually: (1) the Bielefeld Speech and Gesture Alignment Corpus (University of Bielefeld) is an extensively annotated corpus of natural language together along with accompanying gestures and (2) the Dicta-Sign Corpus (University of Hamburg) contains dialogue on different topics in german sign language. These two are augmented motion capture data on natural gesticulation (RWTH Aachen, Human Technologies Centre, research group “Natural Media”).

Duration

  • 01.08.2012 – 31.01.2014

Applicants

  • WG 6 „Sprache und andere Modalitäten“ represented by Prof. Dr.-Ing. Stefan Kopp, research group „Sociable Agents“, CITEC, Faculty of Technology, Bielefeld University

Responsible Institution

  • Working group „Sociable Agents“, CITEC, Faculty of Technology, Bielefeld University

Executive Staff

  • Farina Freigang, M.Sc. (50%)
  • Research assistants at Bielefeld University, Hamburg and RWTH Aachen.