Audio Tasks
Discover the tasks and baselines offered with VibraVox dataset.
Discover the tasks and baselines offered with VibraVox dataset.
This task is mainly oriented towards denoising and bandwidth extension, also known as audio super-resolution, which is required to enhance the audio quality of body-conducted captured speech. The model is presented with a pair of audio clips (from a body-conducted captured speech, and from the corresponding clean, full bandwidth airborne-captured speech), and asked to enhance the audio by denoising and regenerating mid and high frequencies from low frequency content only.
Please refer to the Vibravox paper for more information.
Please follow this link to go to the card of our EBEN models: https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models
Please follow this link to get the training code of our models: https://github.com/jhauret/vibravox
Forehead | In-ear Rigid | In-ear Soft | Temple | Throat | |
---|---|---|---|---|---|
Input | |||||
Enhanced by EBEN | |||||
Reference audio |
Explore all the test set enhanced by EBEN models :
The model is presented with an audio file and asked to transcribe the audio file to written text (either normalized text of phonemized text). The most common evaluation metrics are the word error rate (WER), character error rate (CER), or phoneme error rate (PER).
Please refer to the Vibravox paper for more information.
Please follow this link to go to the card of our phonemizers: https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers
Please follow this link to get the training code of our models: https://github.com/jhauret/vibravox
Given an input audio clip and a reference audio clip of a known speaker, the model’s objective is to compare the two clips and verify if they are from the same individual. This often involves extracting embeddings from a deep neural network trained on a large dataset of voices. The model then measures the similarity between these feature sets using techniques like cosine similarity or a learned distance metric. This task is crucial in applications requiring secure access control, such as biometric authentication systems, where a person’s voice acts as a unique identifier.
Please refer to the Vibravox paper for more information.
Please follow this link to get the testing code of our model: https://github.com/jhauret/vibravox