Python: Clustering Similar Words Based On Word2vec
This might be the naive question which I am about to ask. I have a tokenized corpus on which I have trained Gensim's Word2vec model. The code is as below site = Article('http://www
Solution 1:
- No, not really. For reference, common word2vec models which are trained on wikipedia (in english) consists around 3 billion words.
- You can use KNN (or something similar). Gensim has the
most_similar
function to get the closest words. Using a dimensional reduction (like PCA or tsne) you can get yourself a nice cluster. (Not sure if gensim has tsne module, but sklearn has, so you can use it)
btw you're referring to some image, but it's not available.
Post a Comment for "Python: Clustering Similar Words Based On Word2vec"