Dependency parsing and domain adaptation with data-driven LR models and parser ensembles (bibtex)
by Sagae, Kenji and Tsujii, Jun
Abstract:
We present a data-driven variant of the LR algorithm for dependency parsing, and extend it with a best-first search for probabilistic generalized data-driven LR dependency parsing. Parser actions are determined by a machine learning component, based on features that represent the current state of the parser. We apply this parsing framework to both tracks of the CoNLL 2007 shared task on dependency parsing, in each case taking advantage of multiple models trained with different learners. In the multilingual track, we train three data-driven LR models for each of the ten languages, and combine the analyses obtained with each individual model using a maximum spanning tree voting scheme. In the domain adaptation track, we use two models to parse unlabeled data in the target domain to supplement the labeled training set in the source domain, in a scheme similar to one iteration of co-training.
Reference:
Dependency parsing and domain adaptation with data-driven LR models and parser ensembles (Sagae, Kenji and Tsujii, Jun), In Proceedings of the CoNLL 2007 Shared Task. Joint Conferences on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007.
Bibtex Entry:
@inproceedings{sagae_dependency_2007,
	address = {Prague, Czech Republic},
	title = {Dependency parsing and domain adaptation with data-driven {LR} models and parser ensembles},
	url = {http://ict.usc.edu/pubs/Dependency%20Parsing%20and%20Domain%20Adaptation%20with%20LR%20Models%20and%20Parser%20Ensembles.pdf},
	abstract = {We present a data-driven variant of the LR algorithm for dependency parsing, and extend it with a best-first search for probabilistic generalized data-driven LR dependency parsing. Parser actions are determined by a machine learning component, based on features that represent the current state of the parser. We apply this parsing framework to both tracks of the CoNLL 2007 shared task on dependency parsing, in each case taking advantage of multiple models trained with different learners. In the multilingual track, we train three data-driven LR models for each of the ten languages, and combine the analyses obtained with each individual model using a maximum spanning tree voting scheme. In the domain adaptation track, we use two models to parse unlabeled data in the target domain to supplement the labeled training set in the source domain, in a scheme similar to one iteration of co-training.},
	booktitle = {Proceedings of the {CoNLL} 2007 {Shared} {Task}. {Joint} {Conferences} on {Empirical} {Methods} in {Natural} {Language} {Processing} and {Computational} {Natural} {Language} {Learning}},
	author = {Sagae, Kenji and Tsujii, Jun},
	month = jul,
	year = {2007}
}
Powered by bibtexbrowser