Kenya Language Corpus (KenCorpus) was founded by researchers from Maseno University, the University of Nairobi and Africa Nazarene University early in 2021. These universities have been jointly creating a language corpus for Machine Learning and Natural Language Processing
The project was funded by LACUNA Fund: 2021 - 2022
Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 5 of 5 Results
Jul 16, 2024
Awino, Dorcas; Muchemi, Lawrence; Wanzare, Lilian D.A.; Ombui, Edward; Wanjawa, Barack; McOnyango, Owen; Indede, Florence, 2022, "KenSpeech: Swahili Speech Transcriptions", https://doi.org/10.7910/DVN/YHXJSU, Harvard Dataverse, V4
This speech dataset includes both read and spontaneous speech recordings, recorded in Kenya with native Swahili speakers. In total this dataset includes 27 hours 31 minutes 50 seconds of speech data from 26 speakers, that is, 19 females and 7 males. The recordings are of the following audio format: .wav, 16bits, 16kHz, mono and Little Endian. Of th...
Jan 24, 2024
Wanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Ombui, Edward; Muchemi, Lawrence, 2022, "Kencorpus: Kenyan Languages Corpus", https://doi.org/10.7910/DVN/6N5V1K, Harvard Dataverse, V9, UNF:6:GNe2C+9LlgEU6vMMqRldag== [fileUNF]
This project collected text and speech corpora for Languages in Kenya. In KenCorpus project, three languages were strategically selected i.e. Kiswahili, Luhya, and Dholuo. The Luhya Language has several dialects. In the project, 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. Primary data was collected from the respective langua...
Nov 21, 2023
Wanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Muchemi, Lawrence; Ombui, Edward, 2022, "KenSwQuAD – A Question Answering Dataset for Swahili Low Resource Language", https://doi.org/10.7910/DVN/OTL0LM, Harvard Dataverse, V2, UNF:6:ozIF07UtsZNF2B+IVjGluw== [fileUNF]
This research developed a Kencorpus Swahili Question Answering Dataset KenSwQuAD from raw data of Swahili language, which is a low resource language predominantly spoken in Eastern African and also has speakers in other parts of the world. Question Answering datasets are important for machine comprehension of natural language processing tasks such...
May 29, 2022
Wanzare, Lilian D.A; Indede, Florence; McOnyango, Owen; Ombui, Edward; Wanjawa, Barack; Muchemi, Lawrence, 2022, "KenTrans: A Parallel Corpora for Swahili and local Kenyan Languages", https://doi.org/10.7910/DVN/NOAT0W, Harvard Dataverse, V2
This project produced a parallel corpus between Swahili and 2 other Kenya Languages: Dholuo and Luhya. The Luhya Language has several dialects. In the project 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. A total of 12, 400 sentences were translated to Kiswahili from a sample of Dholuo, Luhya texts (1500 Dholuo-Kiswahili sente...
May 4, 2022
Indede, Florence; McOnyango, Owen; Wanzare, Lilian D.A.; Wanjawa, Barack; Ombui, Edward; Muchemi, Lawrence, 2022, "KenPos: Kenyan Languages Part of Speech Tagged dataset", https://doi.org/10.7910/DVN/KLCKL5, Harvard Dataverse, V1, UNF:6:NnoFeX9rZpTE0RC91NrKuQ== [fileUNF]
This project developed a Part of Speech (POS) Tagged dataset of 2 languages in Kenya, Dholuo and Luhya. The Luhya Language has several dialects. In the project 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. The project tagged approximately 143,000 words which include 50,000 words for Dholuo, 27,900words for Lumarachi, 34,300 wo...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.