単語の埋め込み(たんごのうめこみ、英語: Word embedding)とは、自然言語処理(NLP)における一連の言語モデリングおよび特徴学習手法の総称であり、単語や語句が実ベクトル空間上に位置づけられる。単語の数だけの次元を持つ空間から、はるかに低い次元を持つ連続ベクトル空間へと数学的な埋め込みが行われる。
言語学では、単語の埋め込みは、分布意味論の研究分野で議論された。 言語データの大規模なサンプルの分布特性に基づいて、言語項目間の意味的類似性を定量化および分類することを目的としている。 "a word is characterized by the company it keeps"(単語はその周辺によって特徴付けられる)という根本的な考え方は、 ファースによって広められた[10]。
^Mikolov, Tomas; Sutskever, Ilya. "Distributed Representations of Words and Phrases and their Compositionality". arXiv:1310.4546 [cs.CL]。
^Lebret, Rémi; Collobert, Ronan (2013). “Word Emdeddings through Hellinger PCA”. Conference of the European Chapter of the Association for Computational Linguistics (EACL)2014. arXiv:1312.5542. Bibcode: 2013arXiv1312.5542L.
^Qureshi, M. Atif; Greene, Derek (2018-06-04). “EVE: explainable vector based embedding technique using Wikipedia” (英語). Journal of Intelligent Information Systems53: 137–165. arXiv:1702.06891. doi:10.1007/s10844-018-0511-x. ISSN0925-9902.
^Firth, J.R. (1957). “A synopsis of linguistic theory 1930-1955”. Studies in Linguistic Analysis: 1–32. Reprinted in F.R. Palmer, ed (1968). Selected Papers of J.R. Firth 1952-1959. London: Longman
^Bengio, Yoshua; Schwenk, Holger; Senécal, Jean-Sébastien; Morin, Fréderic; Gauvain, Jean-Luc (2006). A Neural Probabilistic Language Model. 194. 137–186. doi:10.1007/3-540-33486-6_6. ISBN978-3-540-30609-2
^Lavelli, Alberto; Sebastiani, Fabrizio; Zanoli, Roberto (2004). Distributional term representations: an experimental comparison. 13th ACM International Conference on Information and Knowledge Management. pp. 615–624. doi:10.1145/1031171.1031284。
^Huang, Eric. (2012). Improving word representations via global context and multiple word prototypes. OCLC857900050
^Camacho-Collados, Jose; Pilehvar, Mohammad Taher (2018). From Word to Sense Embeddings: A Survey on Vector Representations of Meaning. Bibcode:2018arXiv180504032C。
^Neelakantan, Arvind; Shankar, Jeevan; Passos, Alexandre; McCallum, Andrew (2014). “Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space”. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Stroudsburg, PA, USA: Association for Computational Linguistics): 1059–1069. arXiv:1504.06654. doi:10.3115/v1/d14-1113.
^Ruas, Terry; Grosky, William; Aizawa, Akiko (2019-12-01). “Multi-sense embeddings through a word sense disambiguation process”. Expert Systems with Applications136: 288–303. doi:10.1016/j.eswa.2019.06.026. ISSN0957-4174.
^Li, Jiwei; Jurafsky, Dan (2015). “Do Multi-Sense Embeddings Improve Natural Language Understanding?”. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (Stroudsburg, PA, USA: Association for Computational Linguistics): 1722–1732. arXiv:1506.01070. doi:10.18653/v1/d15-1200.
^"contrastive methods build representations by reducing the distance between ... positive pairs ... and increasing the distance between inputs not known to be related (negative pairs)" Kiani. (2022). Joint Embedding Self-Supervised Learning in the Kernel Regime.
^"contrastive learning ... pull together an anchor and a “positive” sample in embedding space, and push apart the anchor from many “negative” samples." Khosla. (2020). Supervised Contrastive Learning.
^"adapting contrastive learning to the fully supervised setting ... These positives are drawn from samples of the same class as the anchor" Khosla. (2020). Supervised Contrastive Learning.