Speech Enhancement

Task

This task is mainly oriented towards denoising and bandwidth extension, also known as audio super-resolution, which is required to enhance the audio quality of body-conducted captured speech. The model is presented with a pair of audio clips (from a body-conducted captured speech, and from the corresponding clean, full bandwidth airborne-captured speech), and asked to enhance the audio by denoising and regenerating mid and high frequencies from low frequency content only.

Please refer to the Vibravox paper for more information.

Pre-trained models on HuggingFace

Please follow this link to go to the card of our EBEN models: https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_models

Training code

Please follow this link to get the training code of our models: https://github.com/jhauret/vibravox

Audio Samples

	Forehead	In-ear Rigid	In-ear Soft	Temple	Throat
Input
Enhanced by EBEN
Reference audio

Vibravox enhanced by EBEN

Explore all the test set enhanced by EBEN models :

Audio Tasks

Speech recognition

Task

The model is presented with an audio file and asked to transcribe the audio file to written text (either normalized text of phonemized text). The most common evaluation metrics are the word error rate (WER), character error rate (CER), or phoneme error rate (PER).

Please refer to the Vibravox paper for more information.

Pre-trained models on HuggingFace

Please follow this link to go to the card of our phonemizers: https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers

Training code

Please follow this link to get the training code of our models: https://github.com/jhauret/vibravox

Audio Tasks

Speaker Verification

Task

Given an input audio clip and a reference audio clip of a known speaker, the model’s objective is to compare the two clips and verify if they are from the same individual. This often involves extracting embeddings from a deep neural network trained on a large dataset of voices. The model then measures the similarity between these feature sets using techniques like cosine similarity or a learned distance metric. This task is crucial in applications requiring secure access control, such as biometric authentication systems, where a person’s voice acts as a unique identifier.

Please refer to the Vibravox paper for more information.

Testing code

Please follow this link to get the testing code of our model: https://github.com/jhauret/vibravox

Audio Tasks

Subsections of Audio Tasks

Speech Enhancement

Task

Pre-trained models on HuggingFace

Training code

Audio Samples

Vibravox enhanced by EBEN

Speech recognition

Task

Pre-trained models on HuggingFace

Training code

Speaker Verification

Task

Testing code