Speech recognition
Task
The model is presented with an audio file and asked to transcribe the audio file to written text (either normalized text of phonemized text). The most common evaluation metrics are the word error rate (WER), character error rate (CER), or phoneme error rate (PER).
Please refer to the Vibravox paper for more information.
Pre-trained models on HuggingFace
Please follow this link to go to the card of our phonemizers: https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers
Training code
Please follow this link to get the training code of our models: https://github.com/jhauret/vibravox