Over the past year, Mozilla worked on expanding its Common Voice initiative to include open source voice recognition datasets in more languages. Now, the organization has released the largest collection of human voices available for use in 18 different languages, including Dutch, Hakha-Chin, Esperanto, Farsi, Basque, Spanish, French, German, Mandarin Chinese (Traditional), Welsh and Kabyle. The collection is composed of 1,400 hours of recorded voice clips from 42,000 contributors. Some of them are volunteers who just wanted to help out, while others are linguists and professionals working in voice technologies.
Mozilla's Common Voice project aims to make it easier for developers who don't have the resources a bigger company (such as Apple or Google) does to create voice-enabled products. It will also give researchers access to a large collection of voices for free. The organization itself plans to use the clips it collects to improve its Speech-to-Text, Text-to-Speech and DeepSpeech engines. Anyone who could use a massive collection of voice clips in dozens of languages can download the set from the Common Voice website.