Text Embedding Inversion Security for Multilingual Language Models

Yiyi Chen, Heather Lent, Johannes Bjerva


Abstract
Textual data is often represented as real-numbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be susceptible to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechanisms have been explored, these are exclusively focused on English, leaving other languages potentially exposed to attacks. This work explores LLM security through multilingual embedding inversion. We define the problem of black-box multilingual and crosslingual inversion attacks, and explore their potential implications. Our findings suggest that multilingual LLMs may be more vulnerable to inversion attacks, in part because English-based defences may be ineffective. To alleviate this, we propose a simple masking defense effective for both monolingual and multilingual models. This study is the first to investigate multilingual inversion attacks, shedding light on the differences in attacks and defenses across monolingual and multilingual settings.
Anthology ID:
2024.acl-long.422
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7808–7827
Language:
URL:
https://aclanthology.org/2024.acl-long.422
DOI:
10.18653/v1/2024.acl-long.422
Bibkey:
Cite (ACL):
Yiyi Chen, Heather Lent, and Johannes Bjerva. 2024. Text Embedding Inversion Security for Multilingual Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7808–7827, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Text Embedding Inversion Security for Multilingual Language Models (Chen et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.422.pdf