Documents detailing African history contain ambiguous names. This is due to people having the same name or being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may significantly alter the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking names to a knowledge base. In recent years language models (LMs) such as BERT have led to improvements in NED systems. This project seeks to examine the effectiveness of using a language model-based NED system to disambiguate people's names. However, these systems require extensive annotated domain-specific data for training, often not available when dealing with historical African documents. Thus the model will be augmented with hand-crafted rules and annotated data to achieve a higher degree of accuracy. The NED models will be evaluated using the F1 score over 10-fold cross-validation. The LM-based NED systems will be compared with a simple probabilistic baseline to benchmark their performances. Finally, the effectiveness of the hand-crafted resources' will be evaluated based on the changes in performance compared to the system without any hand-crafted resources.

Videos



Visit the video on YouTube to like and join the discussion in the comment section.

Documents


Images



Simple example of linking names with the people to whom they refer.

Overview of the Automatic NED model.