#

## Paper
Title: `Sunbird African Language Technology (SALT) dataset`

Paper Link: https://aclanthology.org/2023.emnlp-main.862/

## Abstract
>SALT is a multi-way parallel text and speech corpus of Engish and six languages widely spoken in Uganda and East Africa: Luganda, Lugbara, Acholi, Runyankole, Ateso and Swahili. The core of the dataset is a set of 25,000 sentences covering a range of topics of local relevance, such as agriculture, health and society. Each sentence is translated into all languages, to support machine translation, and speech recordings are made for approximately 5,000 of the sentences both by a variety of speakers in natural settings (suitable for ASR) and by professionals in a studio setting (suitable for text-to-speech).

HomePage: https://github.com/SunbirdAI/salt

### Publications

Multilingual Model and Data Resources for Text-To-Speech in Ugandan Languages. Isaac Owomugisha, Benjamin Akera, Ernest Tonny Mwebaze, John Quinn. 4th Workshop on African Natural Language Processing, 2023. [pdf](https://openreview.net/pdf?id=vaxG0WAPzL)

Machine Translation For African Languages: Community Creation Of Datasets And Models In Uganda. Benjamin Akera, Jonathan Mukiibi, Lydia Sanyu Naggayi, Claire Babirye, Isaac Owomugisha, Solomon Nsumba, Joyce Nakatumba-Nabende, Engineer Bainomugisha, Ernest Mwebaze, John Quinn. 3rd Workshop on African Natural Language Processing, 2022. [pdf](https://openreview.net/pdf?id=BK-z5qzEU-9)
