UD Welsh-CCG (Corpws Cystrawennol y Gymraeg) is a treebank of Welsh, annotated according to the Universal Dependencies guidelines.
The main part of the annotated sentences come from the Welsh Wikipedia. Some sentences have been taken from the Corpus of the Welsh Assembly, from websites of Welsh speaking organisations (Cymdeithas yr Iaith Gymraeg, University of Wales), News (y Golwg, local Welsh language newspapers, BBC Cymru) and Welsh language blogs. A few example sentences are taken from Welsh Grammars (Gramaded Cymraeg Cyfoes: Gareth King, Modern Welsh).
If you use this treebank in your work, please cite:
@inproceedings{heinecke2019,
author = {Heinecke, Johannes and Tyers, Francis M.},
title = {{Development of a Universal Dependencies treebank for Welsh}},
year = {2019},
booktitle = {{Proceedings of the Celtic Language Technology Workshop}},
publisher = {European Association for Machine Translation},
address = {Dublin},
pages = {21--31},
url = {https://www.aclweb.org/anthology/W19-6904},
}
2019-05-15 v2.4
- initial version 2019-05-30
- mutations corrected (or feature Mutation=) 2019-09-15
- lemma for conjunction "ac" normalised to "a", number + "o" + noun corrected
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.4 License: CC BY-SA 4.0 Includes text: yes Genre: grammar-examples wiki nonfiction fiction news Lemmas: converted from manual UPOS: converted from manual XPOS: manual native Features: converted from manual Relations: manual native Contributors: Heinecke, Johannes; Tyers, Francis; Contributing: elsewhere Contact: [email protected] ===============================================================================