Description
|
This project produced a parallel corpus between Swahili and 2 other Kenya Languages: Dholuo and Luhya. The Luhya Language has several dialects. In the project 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. A total of 12, 400 sentences were translated to Kiswahili from a sample of Dholuo, Luhya texts (1500 Dholuo-Kiswahili sentence pairs and 10,900 Luhya-Kiswahili sentence pairs). Each document contains sentence pairs, the sentence in the original language starts with letter “O” followed by a full colon (“O:”) while the translated Kiswahili sentence below it starts with letter “T” followed by a full colon (“T:”). Acknowledgement of translators: Luo - Swahili: Mercy Lavinca Oduoll (Coordinator), Bildad Okebe, Immaculate Ochieng, Mary Muma Luhyia (Logooli) - Swahili: Phillip Lumwamu (Coordinator), Kints Mugoha Musungu, Vivian Alivitsa, Joseph Ambwere, Joyline Ingasiani Luhyia (Bukusu) - Swahili: Martin Barasa Mulwale (Coordinator), Samwel Ralph Nyongesa, Tobias Shikuku, Phelisters N Simiyu Luhyia (Marachi) - Swahili: Judith Awinja (Coordinator), Evans Owino, Belinda Oduor, Frankline Mwaro
|
Notes
| @misc{https://doi.org/10.48550/arxiv.2208.12081, doi = {10.48550/ARXIV.2208.12081}, url = {https://arxiv.org/abs/2208.12081}, author = {Wanjawa, Barack and Wanzare, Lilian and Indede, Florence and McOnyango, Owen and Ombui, Edward and Muchemi, Lawrence}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} } |