Namibian Researcher helps develop African language AI model

A Namibian researcher is part of a team developing African-focused artificial intelligence systems aimed at reducing the dominance of English-language models and improving digital access for underserved African languages.

Namibian computer science researcher Anri Lombard was part of a research team at the University of Cape Town that developed MzansiLM, a new multilingual AI language model trained on all 11 official written languages of South Africa.

The project, which will be presented at the Language Resources and Evaluation Conference 2026 in Mallorca this month, comes as African countries increasingly seek to ensure that local languages are not excluded from rapidly advancing global AI systems.

The research team also developed MzansiText, a multilingual dataset used to train the model from scratch.

Researchers say most African languages remain significantly underrepresented in mainstream AI systems due to limited digital training data, resulting in poor performance by many popular AI tools when used in African languages.

“In language modelling, languages are considered low resource, primarily because there are much fewer and smaller textual datasets available in these languages for training language models,” said Jan Buys.

“Our dataset, MzansiText, is still small compared to data available for high-resource languages such as English and major European and Asian languages, but larger than previous datasets for South African languages.”

The researchers said MzansiLM is believed to be the first publicly available decoder-only language model specifically designed to support all 11 South African written languages within one system.

“There has been real progress in language modelling for African languages, including some South African ones like isiXhosa and isiZulu,” said Francois Meyer.

“But most existing models only cover a subset of languages. With MzansiLM, we wanted to build a single model focused specifically on South Africa that covers all 11 official written languages, including those that are often left out.”

Lombard said the project emerged from his research into AI systems for low-resource languages, an area that remains underdeveloped globally.

“I came into this work through my master’s research, which looks at how different language-model architectures perform for low-resource languages, since that is still a relatively underexplored area,” Lombard said.

“One thing that stood out to me is that publicly available models tended to cover only a subset of the South African languages we care about. MzansiLM was meant to provide a small decoder-only baseline that future work can compare against and build on.”

The development comes as countries across Africa, including Namibia, increasingly explore artificial intelligence adoption in sectors such as education, financial services, healthcare, public administration and digital services. However, concerns remain that many AI systems continue to exclude African languages and local contexts.

Although smaller than commercial AI systems such as ChatGPT, the 125-million-parameter model reportedly performed competitively in several African language benchmarks, including isiXhosa text generation tasks.

The team said the model is intended as a foundational system that developers can adapt for specialised applications rather than as a direct consumer chatbot.

“In practice, that means developers could build tools for specific use cases; for example, summarising information or annotating raw data, in South African languages,” Meyer said.

“Adapting MzansiLM for a limited use case might be more effective and affordable than relying on proprietary large language models, if you want users to be able to interact with a system in their home language.”

The researchers said the project also helps explain why even advanced global AI systems continue to struggle with non-English languages.

“Our findings show that the model can work well when fine-tuned for specific tasks but is not yet able to work well for general-purpose user interaction or instruction following, due to the limited training data,” Buys said.

“This helps to explain why even larger language models don’t yet work as well when used in languages other than English.”

Lombard said continued collaboration and open research would be critical to improving AI systems for African languages.

“A lot of the progress we were able to make depends on earlier open research from the African Natural Language Processing research community, so continuing that openness is essential,” Lombard said.

“We still need better and broader data sources, stronger benchmarks, and the kind of shared datasets, models, code, and results that make it possible for others to reproduce and extend the work.”

The research team has made both MzansiText and MzansiLM publicly available through open-access platforms.