A team of researchers have launched WAXAL, a large-scale speech dataset designed to improve speech technology for African languages and bridge the digital divide affecting millions of speakers across the continent according to a new research paper and an article published on Google research blog
The project, developed through collaboration between Google Research and several African institutions including Makerere University, the University of Ghana, Addis Ababa University, and other partners, provides a large collection of speech recordings in 24 African languages spoken by more than 100 million people. The dataset aims to support the development of voice technologies such as speech recognition and text-to-speech systems that can understand and speak African languages.
According to the researchers, modern voice technologies have largely focused on a few global languages, leaving many African communities underserved.
“The advancement of speech technology has predominantly favoured high-resource languages, creating a significant digital divide for speakers of most Sub-Saharan African languages,” researchers noted.
The WAXAL dataset contains two main components. The first is an Automatic Speech Recognition (ASR) dataset with approximately 1,250 hours of natural speech recordings collected from a diverse group of speakers. The second is a Text-to-Speech (TTS) dataset with over 235 hours of high-quality recordings captured in studio environments to help computers generate natural-sounding voices.
“The collection consists of two main components: an Automated Speech Recognition (ASR) dataset containing approximately 1,250 hours of transcribed, natural speech from a diverse range of speakers, and a Text-to-Speech (TTS) dataset with over 235 hours of high-quality recordings,” the researchers explained.
The project was carried out between 2021 and 2024 and involved community members, linguists, and academic institutions across Africa. In Uganda, Makerere University played a role in collecting data for several languages as part of the collaboration.
To collect natural speech, participants were shown images covering different topics and asked to describe what they saw in their native languages. This approach helped capture spontaneous and everyday speech patterns, which are important for training speech recognition systems.
“Participants were shown a diverse set of images covering at least 50 topics and asked to describe them in their native language,” the research paper states.
The dataset also includes additional information such as speaker age, gender, and recording environment, helping researchers develop more accurate language technologies.
Developers and researchers can now access the dataset freely for academic and technological development. The project leaders believe the resource will help advance digital inclusion and ensure African languages are better represented in modern technologies.
“The WAXAL dataset represents a significant step toward addressing the resource scarcity that has hindered the development of speech technologies for Sub-Saharan African languages,” the researchers added.
Image CREDIT: Google

