Recording protocol

Procedure

The recording process consists of four steps:

Speak in Silence: For a duration of 15 minutes, the participant reads sentences sourced from the French Wikipedia. Each utterance generates a new recording and the transcriptions are preserved.
Quiet in Noise: During 2 minutes and 24 seconds, the participant remains silent in a noisy environment created from the AudioSet samples. These samples have been selected from relevant classes, normalized in loudness, pseudo-spatialized and are played from random directions using a spatialization sphere equipped with 56 loudspeakers. The objective of this phase is to gather realistic background noises that will be combined with the Speak in Silence recordings to maintain a clean reference.
Quiet in Silence: The procedure is repeated for 54 seconds in complete silence to record solely physiological and microphone noises. These samples can be valuable for tasks such as heart rate tracking or simply analyzing the noise properties of the various microphones.
Speak in Noise: The final phase (54 seconds) will primarily serve to test the different systems (Speech Enhancement, Automatic Speech Recognition, Speaker Identification) that will be developed based on the recordings from the first three phases. This real-world testing will provide valuable insights into the performance and effectiveness of these systems in practical scenarios. The noise was recorded using the ZYLIA ZR-1 Portable Recorder from spatialized scenes and replayed in the spatialization sphere with ambisonic processing.