1 to 5 of 5 Results
Jul 16, 2024
Awino, Dorcas; Muchemi, Lawrence; Wanzare, Lilian D.A.; Ombui, Edward; Wanjawa, Barack; McOnyango, Owen; Indede, Florence, 2022, "KenSpeech: Swahili Speech Transcriptions", https://doi.org/10.7910/DVN/YHXJSU, Harvard Dataverse, V4
This speech dataset includes both read and spontaneous speech recordings, recorded in Kenya with native Swahili speakers. In total this dataset includes 27 hours 31 minutes 50 seconds of speech data from 26 speakers, that is, 19 females and 7 males. The recordings are of the following audio format: .wav, 16bits, 16kHz, mono and Little Endian. Of th... |
Jan 24, 2024
Wanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Ombui, Edward; Muchemi, Lawrence, 2022, "Kencorpus: Kenyan Languages Corpus", https://doi.org/10.7910/DVN/6N5V1K, Harvard Dataverse, V9, UNF:6:GNe2C+9LlgEU6vMMqRldag== [fileUNF]
This project collected text and speech corpora for Languages in Kenya. In KenCorpus project, three languages were strategically selected i.e. Kiswahili, Luhya, and Dholuo. The Luhya Language has several dialects. In the project, 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. Primary data was collected from the respective langua... |
Nov 21, 2023
Wanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Muchemi, Lawrence; Ombui, Edward, 2022, "KenSwQuAD – A Question Answering Dataset for Swahili Low Resource Language", https://doi.org/10.7910/DVN/OTL0LM, Harvard Dataverse, V2, UNF:6:ozIF07UtsZNF2B+IVjGluw== [fileUNF]
This research developed a Kencorpus Swahili Question Answering Dataset KenSwQuAD from raw data of Swahili language, which is a low resource language predominantly spoken in Eastern African and also has speakers in other parts of the world. Question Answering datasets are important for machine comprehension of natural language processing tasks such... |
May 29, 2022
Wanzare, Lilian D.A; Indede, Florence; McOnyango, Owen; Ombui, Edward; Wanjawa, Barack; Muchemi, Lawrence, 2022, "KenTrans: A Parallel Corpora for Swahili and local Kenyan Languages", https://doi.org/10.7910/DVN/NOAT0W, Harvard Dataverse, V2
This project produced a parallel corpus between Swahili and 2 other Kenya Languages: Dholuo and Luhya. The Luhya Language has several dialects. In the project 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. A total of 12, 400 sentences were translated to Kiswahili from a sample of Dholuo, Luhya texts (1500 Dholuo-Kiswahili sente... |
May 4, 2022
Indede, Florence; McOnyango, Owen; Wanzare, Lilian D.A.; Wanjawa, Barack; Ombui, Edward; Muchemi, Lawrence, 2022, "KenPos: Kenyan Languages Part of Speech Tagged dataset", https://doi.org/10.7910/DVN/KLCKL5, Harvard Dataverse, V1, UNF:6:NnoFeX9rZpTE0RC91NrKuQ== [fileUNF]
This project developed a Part of Speech (POS) Tagged dataset of 2 languages in Kenya, Dholuo and Luhya. The Luhya Language has several dialects. In the project 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. The project tagged approximately 143,000 words which include 50,000 words for Dholuo, 27,900words for Lumarachi, 34,300 wo... |