Articicial Intellegence

Building beneficial AI systems in Africa

Oct 31, 2023

Kavengi Kitonga

0:00/1:34

With every passing day, Artificial Intelligence (AI) technologies are increasingly becoming integrated in all domains of life: economic, social and political. The utility of AI technologies to solve problems in diverse sectors such as healthcare, education, agriculture and manufacturing fuels this integration.

Moreover, projections from the much quoted PWC report indicate that not only is AI here, but also that it is here to stay. According to this report, AI contribution to the global economy is projected to reach $ 15.7 trillion by 2030, with $ 6.6 trillion coming from increased productivity while consumption side effects are estimated to account for $ 9.1 trillion.

Africa has not been left behind. AI powered solutions are slowly making their way into a variety of domains such as agriculture and finance to name a few (Ade-Ibijola and Okonkwo, 2023). In Kenya, AI powered solutions are assisting farmers in determining input allocation, disease identification and yield forecasting while in Ghana, there is Mazzuma,  a mobile payment ecosystem that implements mobile money, blockchain and artificial intelligence for payment transactions. Similarly, robust communities comprising diverse stakeholders within the AI ecosystem are making their imprint. Regarding robust communities and conventions, Masakhane and Deep Learning Indaba feature prominently. Masakhane is a community of African researchers spread across the continent, with a mission ‘…to strengthen and spur NLP research in African languages, for Africans, by Africans. Similarly, the annual Deep Learning Indaba conference has emerged as a leader convener of the machine learning and AI community in Africa.

That said, widespread adoption of AI powered technologies remains low in Africa. Herein, lies an opportunity for Africa to accelerate AI use, and in the process retell her story in this technological revolution. The fourth industrial revolution, encompassing AI and other technology advancements, affords the continent a redeeming opportunity. This opportunity not only allows for the acceleration of widespread use of AI technologies in the continent, but also for the curation of AI systems that encode sovereignty and context relevant development agenda.


Data: the new gold! 

Data is the new gold! With regard to AI, this expression is not hot air. AI requires data to train models. In fact, for the avoidance of modesty, AI leverages on massive data. Further, dataset availability does not simply refer to the existence of data but also its existence in a format that is machine-readable. In Africa, both of these dimensions are a challenge and have serious implications on the types of tools to be developed, level of customization and inclusivity. 

To illustrate, datasets for numerous languages in Africa are non-existent. In the event that data exists, say for popular languages like Swahili, it is often not in a machine-readable format (Shikali and Mohkosi,2020). Consequently, there is limited data on which language models can be trained which is a prerequisite to developing suitable AI technologies. The implication of this being the dearth of language tools in African languages such as spell checkers, dictionaries and thesaurus. The absence of such tools, to a great extent excludes the digital presence of African languages. Such exclusion is linked to the demise of language, the loss of culture and sovereignty. 

The lack of localized data limits the extent to which AI technologies can be customized to suit context. A case in point is in the development of AI tools for education. AI technologies can be an important asset in education, especially in the creation of bespoke tools for students. This is particularly relevant in the African context, given that many public schools are understaffed. Development of customized technologies necessitates the collection of individual information, which is not readily available . The dearth of such data, limits the extent to which AI tools can be an asset, which in turn retards progress on development goals such as SDG 4 concerned with quality education.

In other scenarios, the lack of local data can perpetuate bias and adversely impact individuals’ welfare. A case in point would be facial recognition technologies. These technologies have come under intense scrutiny, in some cases even being halted, due to identification inaccuracies such as  in the case of people of color including those of African descent. Inaccuracies partly arise because the datasets used to train such systems do not use local data i.e. faces of individuals in that context. In some instances, misidentification has led to wrongful arrests negatively impacting individuals. As facial recognition technologies gain footing in Africa, the imperative to customize such tools by using local data becomes necessary. Otherwise, relying on technologies trained on non-local individuals’ facial features constitute a point of exclusion, and is an affront on the dignity of individuals and perpetuates inequalities that  Africans have long had to contend with.


Data governance:  working towards data availability

Understanding that data is gold is key to designing intentional policies that place African countries at the driving seat of their own AI advancement. How then can policy play a key role in advancing data availability? 


Leveraging on multi-stakeholder collaboration within the AI eco-system

The AI ecosystem comprises of diverse stakeholders, therefore efforts to create datasets would do well to capitalize on the strengths of diverse communities. Simunyu et al. 2023, highlight the range of stakeholders involved in the Natural Language Processing (NLP) ecosystem: content creators, language practitioners, language technologists, content curators and innovators. Collaboration, they highlight, is important given the interdependence of workflows. To illustrate, language technologists build models using datasets. Such datasets include the output from content creators who may be contracted for a task or indirectly through the scraping of their content. The utilization of such data for model building demands that it is not only available in machine readable format but also that copyright is respected. Respect of copyright is an important aspect for the dataset curator who determines the constitution of the final dataset for model training, which is reliant on content creators. Availability of machine readable data for training, relies on the skill of annotators whose role is to label datasets.  Often times, these stakeholders work in isolated environments. Building partnerships not only ensures a greater volume of data but also its availability in the appropriate format.


Capacity building

African representation in the AI workforce is underwhelming. These disparities are exacerbated when other diversity dimensions such as gender, ethnicity, geography and differences in ability are considered. Dataset creation as illustrated in the preceding section relies on diverse expertise along the AI workflow. Limited human capital compromises both the availability and quality of data. Therefore, initiatives geared towards creation of AI courses within tertiary institutions, funding local AI research and supporting organizations such as Masakhane and Deep Learning Indaba can go a long way towards easing human capital constraints.


Data ethics

Ethical considerations are part and parcel of ensuring community buy-in into data collection. Developing local AI solutions  such as in education would necessitate the collection of individualized data. Creation of African Language datasets relies on numerous forms of data such as speech and text, necessitating community contributions i.e. language speakers. In all these instances, the consent of individuals must be sought, their privacy respected and that the data collected dignifies and appropriately represents community members in the way they desire (Siminyu et al. 2022). In this regard, participatory approaches have been suggested as one way that NLP practitioners can collaborate with communities. According to Pillai et al. 2023, participatory approaches during data collection would involve ‘…providing information to community members about the types of data that may be used, how the data will be used, how the data will be stored and protected, and potential benefits and harms in using the data for developing the system’. Therefore, policy direction on ethical considerations with respect to data collection will go a long way in sustaining such collaborations.


Conclusion

Unlike the third revolution, Africa has an opportunity to bask in the glory of the forth revolution. Whilst it is acknowledged that numerous factors contribute to the acceleration of AI in the continent, an important starting point is data availability. Realizing that ‘data is the new gold’, data governance policy geared towards enhancing capacity and collaboration, and, fostering ethics; will be instrumental in availing increased volumes of data consistent with ethical guidelines.

With every passing day, Artificial Intelligence (AI) technologies are increasingly becoming integrated in all domains of life: economic, social and political. The utility of AI technologies to solve problems in diverse sectors such as healthcare, education, agriculture and manufacturing fuels this integration.

Moreover, projections from the much quoted PWC report indicate that not only is AI here, but also that it is here to stay. According to this report, AI contribution to the global economy is projected to reach $ 15.7 trillion by 2030, with $ 6.6 trillion coming from increased productivity while consumption side effects are estimated to account for $ 9.1 trillion.

Africa has not been left behind. AI powered solutions are slowly making their way into a variety of domains such as agriculture and finance to name a few (Ade-Ibijola and Okonkwo, 2023). In Kenya, AI powered solutions are assisting farmers in determining input allocation, disease identification and yield forecasting while in Ghana, there is Mazzuma,  a mobile payment ecosystem that implements mobile money, blockchain and artificial intelligence for payment transactions. Similarly, robust communities comprising diverse stakeholders within the AI ecosystem are making their imprint. Regarding robust communities and conventions, Masakhane and Deep Learning Indaba feature prominently. Masakhane is a community of African researchers spread across the continent, with a mission ‘…to strengthen and spur NLP research in African languages, for Africans, by Africans. Similarly, the annual Deep Learning Indaba conference has emerged as a leader convener of the machine learning and AI community in Africa.

That said, widespread adoption of AI powered technologies remains low in Africa. Herein, lies an opportunity for Africa to accelerate AI use, and in the process retell her story in this technological revolution. The fourth industrial revolution, encompassing AI and other technology advancements, affords the continent a redeeming opportunity. This opportunity not only allows for the acceleration of widespread use of AI technologies in the continent, but also for the curation of AI systems that encode sovereignty and context relevant development agenda.


Data: the new gold! 

Data is the new gold! With regard to AI, this expression is not hot air. AI requires data to train models. In fact, for the avoidance of modesty, AI leverages on massive data. Further, dataset availability does not simply refer to the existence of data but also its existence in a format that is machine-readable. In Africa, both of these dimensions are a challenge and have serious implications on the types of tools to be developed, level of customization and inclusivity. 

To illustrate, datasets for numerous languages in Africa are non-existent. In the event that data exists, say for popular languages like Swahili, it is often not in a machine-readable format (Shikali and Mohkosi,2020). Consequently, there is limited data on which language models can be trained which is a prerequisite to developing suitable AI technologies. The implication of this being the dearth of language tools in African languages such as spell checkers, dictionaries and thesaurus. The absence of such tools, to a great extent excludes the digital presence of African languages. Such exclusion is linked to the demise of language, the loss of culture and sovereignty. 

The lack of localized data limits the extent to which AI technologies can be customized to suit context. A case in point is in the development of AI tools for education. AI technologies can be an important asset in education, especially in the creation of bespoke tools for students. This is particularly relevant in the African context, given that many public schools are understaffed. Development of customized technologies necessitates the collection of individual information, which is not readily available . The dearth of such data, limits the extent to which AI tools can be an asset, which in turn retards progress on development goals such as SDG 4 concerned with quality education.

In other scenarios, the lack of local data can perpetuate bias and adversely impact individuals’ welfare. A case in point would be facial recognition technologies. These technologies have come under intense scrutiny, in some cases even being halted, due to identification inaccuracies such as  in the case of people of color including those of African descent. Inaccuracies partly arise because the datasets used to train such systems do not use local data i.e. faces of individuals in that context. In some instances, misidentification has led to wrongful arrests negatively impacting individuals. As facial recognition technologies gain footing in Africa, the imperative to customize such tools by using local data becomes necessary. Otherwise, relying on technologies trained on non-local individuals’ facial features constitute a point of exclusion, and is an affront on the dignity of individuals and perpetuates inequalities that  Africans have long had to contend with.


Data governance:  working towards data availability

Understanding that data is gold is key to designing intentional policies that place African countries at the driving seat of their own AI advancement. How then can policy play a key role in advancing data availability? 


Leveraging on multi-stakeholder collaboration within the AI eco-system

The AI ecosystem comprises of diverse stakeholders, therefore efforts to create datasets would do well to capitalize on the strengths of diverse communities. Simunyu et al. 2023, highlight the range of stakeholders involved in the Natural Language Processing (NLP) ecosystem: content creators, language practitioners, language technologists, content curators and innovators. Collaboration, they highlight, is important given the interdependence of workflows. To illustrate, language technologists build models using datasets. Such datasets include the output from content creators who may be contracted for a task or indirectly through the scraping of their content. The utilization of such data for model building demands that it is not only available in machine readable format but also that copyright is respected. Respect of copyright is an important aspect for the dataset curator who determines the constitution of the final dataset for model training, which is reliant on content creators. Availability of machine readable data for training, relies on the skill of annotators whose role is to label datasets.  Often times, these stakeholders work in isolated environments. Building partnerships not only ensures a greater volume of data but also its availability in the appropriate format.


Capacity building

African representation in the AI workforce is underwhelming. These disparities are exacerbated when other diversity dimensions such as gender, ethnicity, geography and differences in ability are considered. Dataset creation as illustrated in the preceding section relies on diverse expertise along the AI workflow. Limited human capital compromises both the availability and quality of data. Therefore, initiatives geared towards creation of AI courses within tertiary institutions, funding local AI research and supporting organizations such as Masakhane and Deep Learning Indaba can go a long way towards easing human capital constraints.


Data ethics

Ethical considerations are part and parcel of ensuring community buy-in into data collection. Developing local AI solutions  such as in education would necessitate the collection of individualized data. Creation of African Language datasets relies on numerous forms of data such as speech and text, necessitating community contributions i.e. language speakers. In all these instances, the consent of individuals must be sought, their privacy respected and that the data collected dignifies and appropriately represents community members in the way they desire (Siminyu et al. 2022). In this regard, participatory approaches have been suggested as one way that NLP practitioners can collaborate with communities. According to Pillai et al. 2023, participatory approaches during data collection would involve ‘…providing information to community members about the types of data that may be used, how the data will be used, how the data will be stored and protected, and potential benefits and harms in using the data for developing the system’. Therefore, policy direction on ethical considerations with respect to data collection will go a long way in sustaining such collaborations.


Conclusion

Unlike the third revolution, Africa has an opportunity to bask in the glory of the forth revolution. Whilst it is acknowledged that numerous factors contribute to the acceleration of AI in the continent, an important starting point is data availability. Realizing that ‘data is the new gold’, data governance policy geared towards enhancing capacity and collaboration, and, fostering ethics; will be instrumental in availing increased volumes of data consistent with ethical guidelines.

With every passing day, Artificial Intelligence (AI) technologies are increasingly becoming integrated in all domains of life: economic, social and political. The utility of AI technologies to solve problems in diverse sectors such as healthcare, education, agriculture and manufacturing fuels this integration.

Moreover, projections from the much quoted PWC report indicate that not only is AI here, but also that it is here to stay. According to this report, AI contribution to the global economy is projected to reach $ 15.7 trillion by 2030, with $ 6.6 trillion coming from increased productivity while consumption side effects are estimated to account for $ 9.1 trillion.

Africa has not been left behind. AI powered solutions are slowly making their way into a variety of domains such as agriculture and finance to name a few (Ade-Ibijola and Okonkwo, 2023). In Kenya, AI powered solutions are assisting farmers in determining input allocation, disease identification and yield forecasting while in Ghana, there is Mazzuma,  a mobile payment ecosystem that implements mobile money, blockchain and artificial intelligence for payment transactions. Similarly, robust communities comprising diverse stakeholders within the AI ecosystem are making their imprint. Regarding robust communities and conventions, Masakhane and Deep Learning Indaba feature prominently. Masakhane is a community of African researchers spread across the continent, with a mission ‘…to strengthen and spur NLP research in African languages, for Africans, by Africans. Similarly, the annual Deep Learning Indaba conference has emerged as a leader convener of the machine learning and AI community in Africa.

That said, widespread adoption of AI powered technologies remains low in Africa. Herein, lies an opportunity for Africa to accelerate AI use, and in the process retell her story in this technological revolution. The fourth industrial revolution, encompassing AI and other technology advancements, affords the continent a redeeming opportunity. This opportunity not only allows for the acceleration of widespread use of AI technologies in the continent, but also for the curation of AI systems that encode sovereignty and context relevant development agenda.


Data: the new gold! 

Data is the new gold! With regard to AI, this expression is not hot air. AI requires data to train models. In fact, for the avoidance of modesty, AI leverages on massive data. Further, dataset availability does not simply refer to the existence of data but also its existence in a format that is machine-readable. In Africa, both of these dimensions are a challenge and have serious implications on the types of tools to be developed, level of customization and inclusivity. 

To illustrate, datasets for numerous languages in Africa are non-existent. In the event that data exists, say for popular languages like Swahili, it is often not in a machine-readable format (Shikali and Mohkosi,2020). Consequently, there is limited data on which language models can be trained which is a prerequisite to developing suitable AI technologies. The implication of this being the dearth of language tools in African languages such as spell checkers, dictionaries and thesaurus. The absence of such tools, to a great extent excludes the digital presence of African languages. Such exclusion is linked to the demise of language, the loss of culture and sovereignty. 

The lack of localized data limits the extent to which AI technologies can be customized to suit context. A case in point is in the development of AI tools for education. AI technologies can be an important asset in education, especially in the creation of bespoke tools for students. This is particularly relevant in the African context, given that many public schools are understaffed. Development of customized technologies necessitates the collection of individual information, which is not readily available . The dearth of such data, limits the extent to which AI tools can be an asset, which in turn retards progress on development goals such as SDG 4 concerned with quality education.

In other scenarios, the lack of local data can perpetuate bias and adversely impact individuals’ welfare. A case in point would be facial recognition technologies. These technologies have come under intense scrutiny, in some cases even being halted, due to identification inaccuracies such as  in the case of people of color including those of African descent. Inaccuracies partly arise because the datasets used to train such systems do not use local data i.e. faces of individuals in that context. In some instances, misidentification has led to wrongful arrests negatively impacting individuals. As facial recognition technologies gain footing in Africa, the imperative to customize such tools by using local data becomes necessary. Otherwise, relying on technologies trained on non-local individuals’ facial features constitute a point of exclusion, and is an affront on the dignity of individuals and perpetuates inequalities that  Africans have long had to contend with.


Data governance:  working towards data availability

Understanding that data is gold is key to designing intentional policies that place African countries at the driving seat of their own AI advancement. How then can policy play a key role in advancing data availability? 


Leveraging on multi-stakeholder collaboration within the AI eco-system

The AI ecosystem comprises of diverse stakeholders, therefore efforts to create datasets would do well to capitalize on the strengths of diverse communities. Simunyu et al. 2023, highlight the range of stakeholders involved in the Natural Language Processing (NLP) ecosystem: content creators, language practitioners, language technologists, content curators and innovators. Collaboration, they highlight, is important given the interdependence of workflows. To illustrate, language technologists build models using datasets. Such datasets include the output from content creators who may be contracted for a task or indirectly through the scraping of their content. The utilization of such data for model building demands that it is not only available in machine readable format but also that copyright is respected. Respect of copyright is an important aspect for the dataset curator who determines the constitution of the final dataset for model training, which is reliant on content creators. Availability of machine readable data for training, relies on the skill of annotators whose role is to label datasets.  Often times, these stakeholders work in isolated environments. Building partnerships not only ensures a greater volume of data but also its availability in the appropriate format.


Capacity building

African representation in the AI workforce is underwhelming. These disparities are exacerbated when other diversity dimensions such as gender, ethnicity, geography and differences in ability are considered. Dataset creation as illustrated in the preceding section relies on diverse expertise along the AI workflow. Limited human capital compromises both the availability and quality of data. Therefore, initiatives geared towards creation of AI courses within tertiary institutions, funding local AI research and supporting organizations such as Masakhane and Deep Learning Indaba can go a long way towards easing human capital constraints.


Data ethics

Ethical considerations are part and parcel of ensuring community buy-in into data collection. Developing local AI solutions  such as in education would necessitate the collection of individualized data. Creation of African Language datasets relies on numerous forms of data such as speech and text, necessitating community contributions i.e. language speakers. In all these instances, the consent of individuals must be sought, their privacy respected and that the data collected dignifies and appropriately represents community members in the way they desire (Siminyu et al. 2022). In this regard, participatory approaches have been suggested as one way that NLP practitioners can collaborate with communities. According to Pillai et al. 2023, participatory approaches during data collection would involve ‘…providing information to community members about the types of data that may be used, how the data will be used, how the data will be stored and protected, and potential benefits and harms in using the data for developing the system’. Therefore, policy direction on ethical considerations with respect to data collection will go a long way in sustaining such collaborations.


Conclusion

Unlike the third revolution, Africa has an opportunity to bask in the glory of the forth revolution. Whilst it is acknowledged that numerous factors contribute to the acceleration of AI in the continent, an important starting point is data availability. Realizing that ‘data is the new gold’, data governance policy geared towards enhancing capacity and collaboration, and, fostering ethics; will be instrumental in availing increased volumes of data consistent with ethical guidelines.

© 2024, The Nuruba Media & Publishing Company Ltd. & Aberdeen Experience Labs

© 2024, The Nuruba Media & Publishing Company Ltd. & Aberdeen Experience Labs

© 2024, The Nuruba Media & Publishing Company Ltd. & Aberdeen Experience Labs

© 2024, The Nuruba Media & Publishing Company Ltd. & Aberdeen Experience Labs