Makerere University NLP Datasets

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1 to 6 of 6 Results

Multilingual Parallel Text Corpora for East African Languages Dec 5, 2023 Babirye, Claire; Tusubira, Francis Jeremy; Nakatumba-Nabende, Joyce; Katumba, Andrew; Ssentanda, Medadi; Nabende, Peter; Mukiibi, Jonathan; Wairagala, Eric Peter; Bateesa,Tobius, 2023, "Multilingual Parallel Text Corpora for East African Languages", https://doi.org/10.7910/DVN/BEROE0, Harvard Dataverse, V1 This is a partial multilingual parallel corpora of 5 East African languages. The dataset contains an English text corpus that has been translated into five East African languages: Acholi, Runyankore, Luganda, Lumasaba, and Swahili.
Sentiment Tagged Parallel Corpus for Luganda and Swahili Mar 24, 2023 Babirye, Claire; Tusubira, Jeremy; Mukiibi, Jonathan; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Sentiment Tagged Parallel Corpus for Luganda and Swahili", https://doi.org/10.7910/DVN/XSGIKR, Harvard Dataverse, V1, UNF:6:90lmKYLNv/Au3PeL8ZI2NQ== [fileUNF] This dataset contains 10,000 parallel sentiment-tagged sentences. English sentences were translated to both Luganda and Swahili. The translations were done by language experts and professional translators in collaboration with researchers at Makerere University. All sentences were tagged with a sentiment code. The sentiment tags were applied with r...
Lumasaba Monolingual Corpus Mar 22, 2023 Nabende, Peter; Muzaki, Naomi; Babirye, Claire; Mukiibi, Jonathan; Tusubira, Jeremy; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Lumasaba Monolingual Corpus", https://doi.org/10.7910/DVN/HW3IKL, Harvard Dataverse, V1, UNF:6:GFbMLraO3tec8o7ba8RiSA== [fileUNF] Lumasaba sometimes known as Lugisu is a Bantu language spoken in the Eastern part of Uganda. This dataset contains a total of 39,999 sentences. The sentences are split into two separate files. One file contains 20,764 sentences from the Northern dialect and another one contains 19,235 sentences from the Southern dialect. This dataset was compiled b...
Kiswahili Monolingual Corpus Mar 22, 2023 Tusubira, Jeremy; Davis, David; Wanzare, Lilian; Babirye, Claire; Mukiibi, Jonathan; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Kiswahili Monolingual Corpus", https://doi.org/10.7910/DVN/NHMTIC, Harvard Dataverse, V1 This dataset contains 100,000 Kiswahili sentences. We want to thank the team at the Makerere AI and Marconi Labs at Makerere University, TAVODET Youth Development (TYD) Innovation Incubator, Ai Kenya, Maseno University, United States International University-Africa (USIU-Africa), and Kabarak University who have worked tirelessly and collaboratively...
Luganda Monolingual Corpus Mar 22, 2023 Mukiibi, Jonathan; Babirye, Claire; Tusubira, Jeremy; Bateesa, Tobias; Wairagala, Eric Peter; Mutebi, Chodrine; Nakatumba-Nabende, Joyce; Katumba, Andrew; Ssenkungu, Ivan; Sentanda, Medadi, 2023, "Luganda Monolingual Corpus", https://doi.org/10.7910/DVN/EQOWTW, Harvard Dataverse, V1, UNF:6:mjNMyxRYQ2QxhXHY2apoFw== [fileUNF] This dataset contains 100,000 Luganda sentences. Luganda is a Bantu language and is one of the major languages spoken in Uganda. This dataset was compiled by researchers at the Makerere AI and Data Science Research Lab and Marconi Research and Innovation Lab. We want to thank the Department of African Languages, Makerere University and the Ekibiina...
Acoli Monolingual Corpus Mar 22, 2023 Ayugi, Carolyne; Okidi, George; Babirye, Claire; Mukiibi, Jonathan; Tusubira, Jeremy; Wairagala, Eric Peter; Bateesa, Tobias; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Acoli Monolingual Corpus", https://doi.org/10.7910/DVN/DCKCQA, Harvard Dataverse, V1, UNF:6:DiSxoXZhr4F8u42TjDStog== [fileUNF] Acoli is a very low-resourced language spoken in parts of Northern Uganda. This dataset contains 40,037 Acoli sentences. The sentences were collected and evaluated by Acoli linguists with the collaboration of teams at Marconi Research and Innovation Lab and Makerere AI Lab from Makerere University. For more information on how the dataset was create...

Multilingual Parallel Text Corpora for East African Languages

Dec 5, 2023

Babirye, Claire; Tusubira, Francis Jeremy; Nakatumba-Nabende, Joyce; Katumba, Andrew; Ssentanda, Medadi; Nabende, Peter; Mukiibi, Jonathan; Wairagala, Eric Peter; Bateesa,Tobius, 2023, "Multilingual Parallel Text Corpora for East African Languages", https://doi.org/10.7910/DVN/BEROE0, Harvard Dataverse, V1

This is a partial multilingual parallel corpora of 5 East African languages. The dataset contains an English text corpus that has been translated into five East African languages: Acholi, Runyankore, Luganda, Lumasaba, and Swahili.

Sentiment Tagged Parallel Corpus for Luganda and Swahili

Mar 24, 2023

Babirye, Claire; Tusubira, Jeremy; Mukiibi, Jonathan; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Sentiment Tagged Parallel Corpus for Luganda and Swahili", https://doi.org/10.7910/DVN/XSGIKR, Harvard Dataverse, V1, UNF:6:90lmKYLNv/Au3PeL8ZI2NQ== [fileUNF]

This dataset contains 10,000 parallel sentiment-tagged sentences. English sentences were translated to both Luganda and Swahili. The translations were done by language experts and professional translators in collaboration with researchers at Makerere University. All sentences were tagged with a sentiment code. The sentiment tags were applied with r...

Lumasaba Monolingual Corpus

Mar 22, 2023

Nabende, Peter; Muzaki, Naomi; Babirye, Claire; Mukiibi, Jonathan; Tusubira, Jeremy; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Lumasaba Monolingual Corpus", https://doi.org/10.7910/DVN/HW3IKL, Harvard Dataverse, V1, UNF:6:GFbMLraO3tec8o7ba8RiSA== [fileUNF]

Lumasaba sometimes known as Lugisu is a Bantu language spoken in the Eastern part of Uganda. This dataset contains a total of 39,999 sentences. The sentences are split into two separate files. One file contains 20,764 sentences from the Northern dialect and another one contains 19,235 sentences from the Southern dialect. This dataset was compiled b...

Kiswahili Monolingual Corpus

Mar 22, 2023

Tusubira, Jeremy; Davis, David; Wanzare, Lilian; Babirye, Claire; Mukiibi, Jonathan; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Kiswahili Monolingual Corpus", https://doi.org/10.7910/DVN/NHMTIC, Harvard Dataverse, V1

This dataset contains 100,000 Kiswahili sentences. We want to thank the team at the Makerere AI and Marconi Labs at Makerere University, TAVODET Youth Development (TYD) Innovation Incubator, Ai Kenya, Maseno University, United States International University-Africa (USIU-Africa), and Kabarak University who have worked tirelessly and collaboratively...

Luganda Monolingual Corpus

Mar 22, 2023

Mukiibi, Jonathan; Babirye, Claire; Tusubira, Jeremy; Bateesa, Tobias; Wairagala, Eric Peter; Mutebi, Chodrine; Nakatumba-Nabende, Joyce; Katumba, Andrew; Ssenkungu, Ivan; Sentanda, Medadi, 2023, "Luganda Monolingual Corpus", https://doi.org/10.7910/DVN/EQOWTW, Harvard Dataverse, V1, UNF:6:mjNMyxRYQ2QxhXHY2apoFw== [fileUNF]

This dataset contains 100,000 Luganda sentences. Luganda is a Bantu language and is one of the major languages spoken in Uganda. This dataset was compiled by researchers at the Makerere AI and Data Science Research Lab and Marconi Research and Innovation Lab. We want to thank the Department of African Languages, Makerere University and the Ekibiina...

Acoli Monolingual Corpus

Mar 22, 2023

Ayugi, Carolyne; Okidi, George; Babirye, Claire; Mukiibi, Jonathan; Tusubira, Jeremy; Wairagala, Eric Peter; Bateesa, Tobias; Nakatumba-Nabende, Joyce; Katumba, Andrew, 2023, "Acoli Monolingual Corpus", https://doi.org/10.7910/DVN/DCKCQA, Harvard Dataverse, V1, UNF:6:DiSxoXZhr4F8u42TjDStog== [fileUNF]

Acoli is a very low-resourced language spoken in parts of Northern Uganda. This dataset contains 40,037 Acoli sentences. The sentences were collected and evaluated by Acoli linguists with the collaboration of teams at Marconi Research and Innovation Lab and Makerere AI Lab from Makerere University. For more information on how the dataset was create...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications