Project Showcase

BabyLMs for Sample-Efficient Language Modelling of Ancient Greek

Investigating the suitability of BabyLM architectures for language modelling in the low-resource context of Ancient Greek.

By: Dimitri Dalakas , Stelio Dalakas , Kimon Christelis

Supervised by: Francois Meyer

About

Abstract

Pretrained Language Models (PLMs) have achieved state-of-the-art performance across a wide range of Natural Language Understanding tasks in recent years. This success has relied on vast amounts of training data, which are not readily available for low-resource languages. Ancient Greek, despite its historical and cultural significance, has limited publicly available data, posing a major obstacle to effective language modelling. The BabyLM Challenge is a shared task for the development of sample-efficient architectures. These data-efficient models show promise for reducing the reliance of language models on extensive training data. In this study, we explore the suitability of BabyLMs for the low-resource context of Ancient Greek. We train an ELC-BERT (winner of the first BabyLM Challenge), a GPT-BERT (winner of the second BabyLM Challenge), and a baseline BERT model. We finetune all models on Part-of-Speech tagging and dependency parsing. Our results show that GPT-BERT comfortably outperforms both ELC-BERT and BERT across these tasks. Furthermore, GPT-BERT outperforms an existing Ancient Greek model trained on substantially more data, emphasising its potential as a data-efficient architecture for Ancient Greek language modelling.

Videos 1

Watch presentations, demos, and related content

YouTube

Loading YouTube video...

YouTube

A slideshow presentation describing the training, evaluation, and results of BERT and BabyLM architectures for Ancient Greek Language Modelling.

Watch on YouTube

Like, comment, and subscribe on YouTube to support the creator!

A slideshow presentation describing the training, evaluation, and results of BERT and BabyLM architectures for Ancient Greek Language Modelling.

Documents 1

Downloadable resources and documentation

PDF

Loading document...

85344f03-ce56-4257-b47e-2edc7b9627f7.pdf

View Full Download

Click "View Full" to open documents in a new window

Gallery 2

Explore the visual story of this exhibit

Title Image