Training data for the avatar was acquired using a markerless motion capture system (The Captury). The recording was conducted at MotionBank, University for Applied Research Mainz. The recorded subjects were professional dancers specialized in contemporary dance. The recording used for training was taken from a single male dancer who was freely improvising to excerpts of music including experimental electronic music, free jazz, and contemporary classic.
For the purpose of recording multiple sensors together with video and audio in a time-synchronized manner, a custom software was developed. This software with the name MultiModal recorder employs the "Matroska" format in combination with the FFmpeg library to create a media file in which all data is stored as multi-channel audio and/or video. In its current version, the software records and plays back sensor data, video images, and point clouds as uncompressed video channels. Audio on the other hand is handled by a separate software that uses the CSound library.
For training a machine-learning model on the correlation between a dancer's movements and music to which the dancer is improvising to, multi-modal recordings have been obtained using our own recording software. These recordings include: data from wearable sensors (pressure sensitive shoes and inertial measurement units), one Kinect Azure camera, and a regular microphone.