Skip to content Skip to sidebar Skip to footer

Python: Clustering Similar Words Based On Word2vec

This might be the naive question which I am about to ask. I have a tokenized corpus on which I have trained Gensim's Word2vec model. The code is as below site = Article('http://www

Solution 1:

  1. No, not really. For reference, common word2vec models which are trained on wikipedia (in english) consists around 3 billion words.
  2. You can use KNN (or something similar). Gensim has the most_similar function to get the closest words. Using a dimensional reduction (like PCA or tsne) you can get yourself a nice cluster. (Not sure if gensim has tsne module, but sklearn has, so you can use it)

btw you're referring to some image, but it's not available.


Post a Comment for "Python: Clustering Similar Words Based On Word2vec"