Speech recognition

Task

The model is presented with an audio file and asked to transcribe the audio file to written text (either normalized text of phonemized text). The most common evaluation metrics are the word error rate (WER), character error rate (CER), or phoneme error rate (PER).

Please refer to the Vibravox paper for more information.

Pre-trained models on HuggingFace

Please follow this link to go to the card of our phonemizers: https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers

Training code

Please follow this link to get the training code of our models: https://github.com/jhauret/vibravox