Knowledge-Rich Self-Supervision for Biomedical Entity Linking

Zhang, Sheng; Cheng, Hao; Vashishth, Shikhar; Wong, Cliff; Xiao, Jinfeng; Liu, Xiaodong; Naumann, Tristan; Gao, Jianfeng; Poon, Hoifung

Computer Science > Computation and Language

arXiv:2112.07887 (cs)

[Submitted on 15 Dec 2021 (v1), last revised 23 May 2022 (this version, v2)]

Title:Knowledge-Rich Self-Supervision for Biomedical Entity Linking

Authors:Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

View PDF

Abstract:Entity linking faces significant challenges such as prolific variations and prevalent ambiguities, especially in high-value domains with myriad entities. Standard classification approaches suffer from the annotation bottleneck and cannot effectively handle unseen entities. Zero-shot entity linking has emerged as a promising direction for generalizing to new entities, but it still requires example gold entity mentions during training and canonical descriptions for all entities, both of which are rarely available outside of Wikipedia. In this paper, we explore Knowledge-RIch Self-Supervision ($\tt KRISS$) for biomedical entity linking, by leveraging readily available domain knowledge. In training, it generates self-supervised mention examples on unlabeled text using a domain ontology and trains a contextual encoder using contrastive learning. For inference, it samples self-supervised mentions as prototypes for each entity and conducts linking by mapping the test mention to the most similar prototype. Our approach can easily incorporate entity descriptions and gold mention labels if available. We conducted extensive experiments on seven standard datasets spanning biomedical literature and clinical notes. Without using any labeled information, our method produces $\tt KRISSBERT$, a universal entity linker for four million UMLS entities that attains new state of the art, outperforming prior self-supervised methods by as much as 20 absolute points in accuracy.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2112.07887 [cs.CL]
	(or arXiv:2112.07887v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.07887

Submission history

From: Sheng Zhang [view email]
[v1] Wed, 15 Dec 2021 05:05:12 UTC (3,332 KB)
[v2] Mon, 23 May 2022 16:10:15 UTC (8,225 KB)

Computer Science > Computation and Language

Title:Knowledge-Rich Self-Supervision for Biomedical Entity Linking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Knowledge-Rich Self-Supervision for Biomedical Entity Linking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators