🌍 Feed

✍🏿 Compose

Bid to include African datasets in LLM training gain traction 🌍 🧠

#ai
#technology
#machine learning
Researchers ratchet up efforts to embed African perspectives into LLM training datasets as AI ethics activists push for a more representative global AI ecosystem.

Seth Onyango, bird story agency 


Efforts to imbue Large Language Models (LLMs) with a deeper understanding of African perspectives and narratives have recently gained traction. 


This includes curating expansive datasets that include African languages, literature, and historical texts, many of which are underrepresented or absent entirely, from current AI models.


This drive comes amidst a glaring lack of representation of Africa's 1,000+ languages and content on the internet, where they make up a mere 1% of all languages present.


Miguel Botero, Director of Social Impact at Biografika, asserts the disparity is palpable. He notes that the current imbalance in AI training data has led to a skewed representation of global knowledge and experiences. 


Roughly a third of global languages stem from Africa, where, by some estimates, as many as 2,000 languages are spoken.


Of these, 75 are spoken by populations exceeding one million, while many remain solely oral, lacking written or digitized formats. 


This situation underscores some of the problems in developing digital databases and creating LLMs, especially given the complexity and high costs associated with crafting LLMs.

 

Botero posits that this not only limits AI's effectiveness but also perpetuates a narrow, often inaccurate portrayal of diverse cultures and societies in the global south.


"African languages make up only about point 1% of the languages represented on the internet. That means that anything you ask an LLM will produce and uphold a vision of humanity, where Africa in this case, only influences the reply marginally," he said. 


"So, everything you ask an LLM will be driven by the views and the general image of humanity (that) has been created by data from the global north. That means Africa is not influencing the replies that are given by these LLMs."


He notes the same thing applies to Wikipedia, where only a small fraction of articles on the online encyclopaedia are produced by people from Africa or Latin America - communities that he terms the "global majority".


Moky Makura, Executive Director at Africa No Filter, is leading an effort to address this issue through her organisation's support of African content that works to change a narrow and mostly negative view of Africa. Working with a variety of platforms and creatives to populate the internet with authentic African content, she insists that African content creators need to flood the internet with better stories, told better, about the continent.


In 2022, AI ChatGPT was hailed as one of the year's most impressive technological innovations upon its release. In a bid to make it less toxic, OpenAI used outsourced Kenyan labourers earning less than $2 per hour, according to a TIME investigation.


Still, ChatGPT has cliches about Africa. In December last year, Dr Ibraheem Dooba, Director of Research and Publication at APS sought to know how ChatGPT sees a successful African. The results although shocking are not surprising.


A successful man turned out to be a European-looking young man, in a suit with Africans who contribute to his success are themselves partially clothed.


Several experiments have shown that ChatGPT, the world's pre-eminent chatbot tends to portray African nations and their rulers in a poor light, hinting at themes of poverty, illness, or graft.


Despite multiple prompts, ChatGPT won't get it right. 


"In sum, without some serious prompt engineering, ChatGPT 4 default responses can be useless," Dooba concluded.


Biografika is currently developing a visibility report focused on making the stories of changemakers from the global majority more accessible and impactful. 


Key to Biografika's strategy is the acknowledgement of the vast array of knowledge production methods that exist outside the Western, text-based paradigm. 


Many communities in Africa and other parts of the global majority rely on oral traditions, music, and other forms of expression that are often overlooked in digital archives and datasets. 


By advocating for the inclusion of these multimodal forms of knowledge, Biografika aims to challenge and expand how information is collected, archived, and utilised in AI development.


Germany, meanwhile, is mulling a collaboration with data collection institutes such as the NGO Afrobarometer, the Konrad Adenauer Foundation said.


"A glaring lack of data does not only exist in the area of languages, but statistically representative data on various socio-economic categories of social coexistence are also incomplete in many African countries and are often plagued by difficulties in the collection processes," wrote in a recent paper.


"However, such representative native data from specific African contexts are indispensable to develop accurate AI models."


bird story agency.

Top comments(0)

SEND

You may like this too...

TechCabal

Kobo360, the logistics powerhouse backed by Goldman Sachs, is rolling out HaulSight—an innovative fleet management software designed to help Africa’s major manufacturers streamline operations, track vehicles, and cut costs in real-time. With rising fuel prices squeezing margins, HaulSight presents an efficient way for companies like Dangote and Unilever to maximize fleet potential. Will this be a game-changer for Africa's logistics industry?
Nov 11, 2024

Disrupt-Africa

Chumz, a Kenyan fintech that enables users to set savings goals with mobile money, has reached 200,000 users and is testing its services in Rwanda. With unique prompts encouraging behavior-based saving, Chumz empowers users with accessible financial tools. Expansion plans target 1 million users across East Africa by 2026.
Nov 6, 2024

Bird Story Agency

As more and more services move online across Africa, giant players like Safaricom, MTN and Airtel are stepping up their efforts to provide Africa with large, green-energy data centres.
Nov 4, 2024

Bird Story Agency

Africa's Gen Zs, promised opportunity, innovation and new technology but faced with the very real world of a workplace in flux, constrained labour markets, high inflation and increasing criticism of their work ethic, are finding very little to cheer about. So a growing number of African universities are stepping in, hoping to bridge the gap between academic qualifications and real-world demands.
Nov 3, 2024

TechCabal

🌍 Access Bank secures the green light to acquire Kenya's National Bank, expanding its reach across East Africa’s largest economy. With an estimated $100 million deal, Access Bank will increase its footprint to 77 branches across 28 counties. Here’s what this move means for Kenya’s banking sector! 💸
Oct 31, 2024

Disrupt-Africa

Nigerian fintech leader Moniepoint has secured $110 million in Series C funding to supercharge its digital banking platform for businesses across Africa. With a track record of innovation and impact, Moniepoint is poised to transform financial access on the continent, bringing seamless banking, payments, and credit services to millions.
Oct 29, 2024

TechCabal

Stanbic Bank Kenya has completed a major upgrade of its core banking software, aiming for enhanced security and improved user experience. Despite brief service disruptions, the bank’s transition to the latest technology signals a drive for modernized banking in a competitive market.
Oct 27, 2024

TechCabal

Kenya is set to revolutionize its financial landscape with the introduction of a new Fast Payment System (FPS). This cutting-edge platform will ensure seamless transactions between banks and fintechs, eliminating existing barriers. The FPS promises to enhance accessibility, making financial services more convenient for all Kenyans.
Oct 18, 2024

TechCrunch

🌍 From humble beginnings in Africa, InstaDeep has grown into a global AI powerhouse. In just over a year under BioNTech’s umbrella, the startup continues to drive innovation in biotech and beyond. CEO Karim Beguir reveals how InstaDeep’s cutting-edge AI is revolutionizing healthcare and industrial optimization alike.
Oct 15, 2024

TechCabal

M-KOPA, a Kenyan fintech giant known for financing solar systems and smartphones, has appointed former Nokia CEO Rajeev Suri as its new board chair. As the company gears up for rapid expansion, Suri's leadership is set to guide M-KOPA into a new era of growth and innovation. With over 5 million users across Africa, M-KOPA is reshaping digital and financial inclusion.
Oct 14, 2024
Home
Business Hub
Market Hub
You
By signing up you agree to ourTerms|About us|Market Hub|Business Hub|Deals Hub