Evaluating Large Language Models for Multilevel Biomedical Text Simplification
Project Showcase

Evaluating Large Language Models for Multilevel Biomedical Text Simplification

By: Nomonde Khalo

Supervised by: Jan Buys


About

Abstract

Biomedical articles are often characterized by structural and technical complexity, making them inaccessible to non-expert readers. Large language models (LLMs) have shown promise in the task of biomedical text simplification, but existing techniques fail to adapt to readers' varying literacy needs. We address this by tailoring medical text simplification for two distinct proficiency levels of English as Home Language (HL) and First Additional Language (FAL). We evaluate a range of open-source instruction models including the Llama-3 family and Mistral-7B across multiple prompting strategies including zero-shot, few-shot, and in-context learning. We investigate instruction fine-tuning with synthetic data using Self-Instruct. Furthermore, we investigate the role of domain-specific knowledge by comparing biomedical LLMs (i.e., BioMistral) against prompt-based external knowledge injection. Our results reveal two key findings. First, we identify a clear trade-off as general-purpose models like Mistral excel at preserving semantic content, while in-context prompting of models like Llama-3.1-8B achieves the highest readability scores. Second, and most notably, we demonstrate that domain-specific models underperform in simplification due to their bias towards retaining complex terminology. Conversely, a small, self-instruct tuned model (Llama-3.2-3B) achieves comparable readability for FAL audiences, showing that targeted, efficient tuning can surpass both larger general-purpose and specialized models.

Documents 1

Downloadable resources and documentation

Click "View Full" to open documents in a new window

Gallery 1

Explore the visual story of this exhibit