Python: Clustering Similar Words Based On Word2vec

February 09, 2024 Post a Comment

This might be the naive question which I am about to ask. I have a tokenized corpus on which I have trained Gensim's Word2vec model. The code is as below site = Article('http://www

Solution 1:

No, not really. For reference, common word2vec models which are trained on wikipedia (in english) consists around 3 billion words.
You can use KNN (or something similar). Gensim has the most_similar function to get the closest words. Using a dimensional reduction (like PCA or tsne) you can get yourself a nice cluster. (Not sure if gensim has tsne module, but sklearn has, so you can use it)

btw you're referring to some image, but it's not available.

Python Playground

Python: Clustering Similar Words Based On Word2vec

Solution 1:

Post a Comment for "Python: Clustering Similar Words Based On Word2vec"