WebMAUS-Pipeline: long video interviews with interlocutor speech, noise, long silence intervals etc.

Interviews and conversation are often recorded and later transcribed to be able to analyze the contents and search for contents such as sounds, words, phrases, etc. The web service WebMAUS available in the CLARIN infrastructure provides tools to combine audio recordings and transcriptions in a way that the words and audio signals are time aligned.

Very long recordings (typical in video interviews: several hours) are difficult to time-align. Therefore the BAS offers a web service that automatically splits long recordings into so-called chunks, segments them individually, and combines the results into a common file, as demonstrated in this use case.

Especially relevant for

everybody who works with very long videos, such as researchers from:

linguistics
phonetics
anthropology
ethnology
history
psychology

Starting point:

Long sound track (*.wav) and orthographic transcript of target speaker with 'chunk segmentation' (*.TextGrid) were the important parts. The appropriate target speakers were orthographically transcribed.

Task:

Complete segmentation of target speaker (words, phones, syllables)

Solution:

Web Interfaces Chunk Preparation + Pipeline MAUS

Related CLARIN-D tools and services

WebMAUS-Basic: Automatic phonetic labelling & segmentation of a single German recording with text
WebMAUS-Multiple

Short guide on using the Web Interface for Chunk-Preparation:

Preparation:

go to http://clarin.phonetik.uni-muenchen.de/BASWebServices
go to "Pipeline Help + FAQs"
download the file linked under "Long video interviews[...]" ftp://clarin.phonetik.uni-muenchen.de/BASWebServices/useCases/longFiles.zip
unzip the folder "longFiles" somewhere in your filesystem, e.g. on your desktop
click on 'BASWebServices'
select BAS webservice 'Chunk Preparation'
open the directory 'longFiles' on your desktop; select the file 'UniMuenster.TextGrid' and drag&drop it to the designated drop area of the service.
press button 'Upload' after the upload is completed you can inspect the uploaded *.TextGrid file by clicking on the file link in the drop area. Note that the tier containing the orthographic transcription is labelled 'Transcription 1'
execute service with the following options
- Language = English - Great Britain
- Input tier name = Transcription 1 (note the blank here!)
- Sampling rate = 16000 (the latter is the sampling rate of the *.wav file we want to segment; if you do not know the sampling rate, you can for example use the command 'file *.wav' on a terminal or audacity to find out the sampling rate of your file.)
- Output format TextGrid
after completion the intermediate result file 'UniMuenster.par' is shown below. This file contains the word-tokenized transcript, the phonological most likely pronunciation, and the chunking information.
download this intermediate result 'UniMuenster.par' to the directory 'longFiles' by right clicking the file name and select "save link as"

Web Interface 'WebMAUS Pipeline':

goto http://clarin.phonetik.uni-muenchen.de/BASWebServices
select BAS webservice 'WebMAUS Pipeline'
click on the button 'Signal-file-upload' and select the file 'UniMuenster.wav' from the directory 'longFiles' for upload
click on the button 'Par-file-upload' and select the file 'UniMuenster.par' from the directory 'longFiles' for upload
You can see the *.par file and listen to the uploaded signal
execute WebMAUS with the following options
- Language = English - Great Britain
- KAN tier in TextGrid = true
- ORT tier in TextGrid = true
- Chunk segmentation = true
Confirm the terms-of-usage and press the button 'Run Web Service'
after completion the resulting TextGrid file is shown below; click on "Save result" to download it to your local system, or
you can click on the button 'Segmentation Preview' below to open a new window in your browser and inspect the segmentation result directly using the EmuLabeller.
Suggestion: zoom in, then mark a segment with the mouse and then press 'c' on your keyboard to play the sound.