1 to 6 of 6 Results
Dec 5, 2023
Babirye, Claire; Tusubira, Francis Jeremy; Nakatumba-Nabende, Joyce; Katumba, Andrew; Ssentanda, Medadi; Nabende, Peter; Mukiibi, Jonathan; Wairagala, Eric Peter; Bateesa,Tobius, 2023, "Multilingual Parallel Text Corpora for East African Languages", https://doi.org/10.7910/DVN/BEROE0, Harvard Dataverse, V1
This is a partial multilingual parallel corpora of 5 East African languages. The dataset contains an English text corpus that has been translated into five East African languages: Acholi, Runyankore, Luganda, Lumasaba, and Swahili. |
Mar 24, 2023
Babirye, Claire; Tusubira, Jeremy; Mukiibi, Jonathan; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Sentiment Tagged Parallel Corpus for Luganda and Swahili", https://doi.org/10.7910/DVN/XSGIKR, Harvard Dataverse, V1, UNF:6:90lmKYLNv/Au3PeL8ZI2NQ== [fileUNF]
This dataset contains 10,000 parallel sentiment-tagged sentences. English sentences were translated to both Luganda and Swahili. The translations were done by language experts and professional translators in collaboration with researchers at Makerere University. All sentences were tagged with a sentiment code. The sentiment tags were applied with r... |
Mar 22, 2023
Nabende, Peter; Muzaki, Naomi; Babirye, Claire; Mukiibi, Jonathan; Tusubira, Jeremy; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Lumasaba Monolingual Corpus", https://doi.org/10.7910/DVN/HW3IKL, Harvard Dataverse, V1, UNF:6:GFbMLraO3tec8o7ba8RiSA== [fileUNF]
Lumasaba sometimes known as Lugisu is a Bantu language spoken in the Eastern part of Uganda. This dataset contains a total of 39,999 sentences. The sentences are split into two separate files. One file contains 20,764 sentences from the Northern dialect and another one contains 19,235 sentences from the Southern dialect. This dataset was compiled b... |
Mar 22, 2023
Tusubira, Jeremy; Davis, David; Wanzare, Lilian; Babirye, Claire; Mukiibi, Jonathan; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Kiswahili Monolingual Corpus", https://doi.org/10.7910/DVN/NHMTIC, Harvard Dataverse, V1
This dataset contains 100,000 Kiswahili sentences. We want to thank the team at the Makerere AI and Marconi Labs at Makerere University, TAVODET Youth Development (TYD) Innovation Incubator, Ai Kenya, Maseno University, United States International University-Africa (USIU-Africa), and Kabarak University who have worked tirelessly and collaboratively... |
Mar 22, 2023
Mukiibi, Jonathan; Babirye, Claire; Tusubira, Jeremy; Bateesa, Tobias; Wairagala, Eric Peter; Mutebi, Chodrine; Nakatumba-Nabende, Joyce; Katumba, Andrew; Ssenkungu, Ivan; Sentanda, Medadi, 2023, "Luganda Monolingual Corpus", https://doi.org/10.7910/DVN/EQOWTW, Harvard Dataverse, V1, UNF:6:mjNMyxRYQ2QxhXHY2apoFw== [fileUNF]
This dataset contains 100,000 Luganda sentences. Luganda is a Bantu language and is one of the major languages spoken in Uganda. This dataset was compiled by researchers at the Makerere AI and Data Science Research Lab and Marconi Research and Innovation Lab. We want to thank the Department of African Languages, Makerere University and the Ekibiina... |
Mar 22, 2023
Ayugi, Carolyne; Okidi, George; Babirye, Claire; Mukiibi, Jonathan; Tusubira, Jeremy; Wairagala, Eric Peter; Bateesa, Tobias; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Acoli Monolingual Corpus", https://doi.org/10.7910/DVN/DCKCQA, Harvard Dataverse, V1, UNF:6:DiSxoXZhr4F8u42TjDStog== [fileUNF]
Acoli is a very low-resourced language spoken in parts of Northern Uganda. This dataset contains 40,037 Acoli sentences. The sentences were collected and evaluated by Acoli linguists with the collaboration of teams at Marconi Research and Innovation Lab and Makerere AI Lab from Makerere University. For more information on how the dataset was create... |