IsiXhosa Medical Machine Translation
Project Showcase

IsiXhosa Medical Machine Translation

By: Elijah Sherman , Nick Matzopoulos , Malibongwe Makhonza

Supervised by: Francois Meyer


About

Abstract

Machine translation (MT) systems for low-resource languages like isiXhosa are still a big challenge because there aren’t many parallel datasets available for training models. Current MT systems work well when translating between isiXhosa and English for more common words and sentences, but performance drops when faced with specialised medical terms. This project will examine whether using synthetic data created either through back-translation or via lexicon induction can improve overall translation quality, help fill vocabulary gaps, and make the model more robust when handling medical terms. To determine this we have created four MT models, each following a different approach, that were evaluated using standard quality metrics (BLEU, chrF, and chrF++) as well as a health term error rate. The models that we have developed in this project are compared to each other as well as the Blocker evaluation dataset. These answers will provide new insights into domain specific adaptation, the feasibility for using MT for isiXhosa in the medical domain, and improving MT for isiXhosa.

Videos 1

Watch presentations, demos, and related content

Documents 1

Downloadable resources and documentation

Click "View Full" to open documents in a new window

Gallery 1

Explore the visual story of this exhibit