Citation

Janis Pagel. Rule-based and Learning-based Approaches for Automatic Bridging Detection and Resolution in German. Master's thesis, University of Stuttgart, 2018. (unpublished)

BibTeX

@mastersthesis{pagel2018b,
   title     = {{Rule-based and Learning-based Approaches for Automatic Bridging Detection and Resolution in German}},
   author    = {Janis Pagel},
   school    = {University of Stuttgart},
   note      = {unpublished},
   year      = {2018},
}

RIS

TY - THES
TI - Rule-based and Learning-based Approaches for Automatic Bridging Detection and Resolution in German
AU - Janis Pagel
PY - 2018
PB - University of Stuttgart
ER - 

Downloads

Abstract

The phenomenon of bridging describes types of non-coreferential entities, which stand in a prototypical or inferable relationship to a previously introduced discourse entity. The machine-aided resolution of such bridging relations tries to detect bridging anaphors and automatically link these anaphors to their antecedents. Research on automatic bridging resolution is rare and resources for training algorithms on the problem of bridging resolution are as well. This thesis therefore introduces new data for bridging resolution in German, the GRAIN corpus, and evaluates the data with regard to the goodness of annotation quality and occurring types of bridging. To ensure the generalizability of the approach, the established corpus DIRNDL is additionally used. In order to determine the difficulty of the task for the present data, an informed baseline is implemented and evaluated. Furthermore, a rule-based system based on Hou et al. (2014) is created in order to perform bridging resolution. To determine the possibilities of using learning-based models for resolving bridging relations, a gradient boosting model is trained on the same data as the rule-based system. The rule-based system performs better than the baseline and achieves an F1-Score of 5.3% for DIRNDL and 4.0% for GRAIN. An analysis with oracle lists for the rule-based system shows that many rules do not have any access to the correct antecedent. The gradient boosting model is able to outperform the rule-based system for DIRNDL (F1 = 11.3%), but is not able to generalize on GRAIN. The differences can be explained by looking at the different structure of the corpora and their topic distribution. Furthermore, the results of the gradient boosting model suggest that more training data would greatly improve learning-based approaches for bridging resolution.