Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Bachelor Defense: Phuc Tran Truong

Wann 27.11.2020 ab 10:00 (Europe/Berlin / UTC100) iCal
Wo online: Zoom

Speaker: Phuc Tran Truong

Topic:  Investigating weak supervision for the extraction of mobility relations and events in German text

Occasion: Bachelor Defense

Zoom: (Link "Informatik-Account" needed)

Abstract:
Supervised machine learning approaches to event extraction rely on large annotated datasets, which are difficult to obtain because traditional manual annotation is expensive and time consuming. In this thesis we apply weak supervision to the task of extracting mobility related events from German text by building a Snorkel based system to generate weakly labeled training data. Instead of hand labeling the data we develop labeling functions based on heuristics including keyword lists, event patterns and let Snorkel combine their outputs into probabilistic labels. We then train a BiLSTM-based model on different configurations of training data: (1) an existing hand-labeled training set, (2) a twice as large training set with probabilistic labels, which we created using our labeling functions and Snorkel, and (3) a combination of both. While we find that in our experiments the model trained purely on data with probabilistic labels performed worse than the model trained on gold labeled data, the model trained on the combined set of gold labeled data and probabilistically labeled data performed best overall. Our results suggest that the introduction of the bigger weakly labeled data complemented the smaller hand-labeled data in a positive way.