Morphological analysis using a sequence decoder

Akyürek, Ekin; Dayanık, Erenay; Yuret, Deniz

doi:10.1162/tacl_a_00286

Computer Science > Computation and Language

arXiv:1805.07946 (cs)

[Submitted on 21 May 2018 (v1), last revised 24 Sep 2019 (this version, v2)]

Title:Morphological analysis using a sequence decoder

Authors:Ekin Akyürek, Erenay Dayanık, Deniz Yuret

View PDF

Abstract:We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g.\ an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and transfer learning settings. We also introduce TrMor2018, a new high accuracy Turkish morphology dataset. Our Morse implementation and the TrMor2018 dataset are available online to support future research\footnote{See \url{this https URL} for a Morse implementation in Julia/Knet \cite{knet2016mlsys} and \url{this https URL} for the new Turkish dataset.}.

Comments:	Final TACL version
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1805.07946 [cs.CL]
	(or arXiv:1805.07946v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1805.07946
Journal reference:	Transactions Of The Association For Computational Linguistics, 7, 567-579 (2019)
Related DOI:	https://doi.org/10.1162/tacl_a_00286

Submission history

From: Deniz Yuret [view email]
[v1] Mon, 21 May 2018 08:49:32 UTC (1,491 KB)
[v2] Tue, 24 Sep 2019 14:30:35 UTC (1,693 KB)

Computer Science > Computation and Language

Title:Morphological analysis using a sequence decoder

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Morphological analysis using a sequence decoder

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators