Dark Mode On/Off

Interactive Learning

C Language course

GO Lang course

Learn JavaScript

Learn HTML

Learn CSS

C Language

C Tutorial

C Programs (100+)

C Compiler

Execute C programs online.

C++ Language

C++ Tutorial

Standard Template Library

C++ Programs (100+)

C++ Compiler

Execute C++ programs online.

Python

Python Tutorial

Python Projects

Python Programs

Python How Tos

Numpy Module

Matplotlib Module

Tkinter Module

Network Programming with Python

Learn Web Scraping

Introduction to Word Vectors and Word2Vec

Technology #NLP# Word2Vec

Recently while I was learning about Word2Vec, I realized there are so many features and important information about this great finding but it is all scattered. I had to read 10 different blogs before completely understanding the logic. If you are like me and love to supplement the paper reading with blogs, this is the right place.

What is Word2Vec?

If you are an NLP fan or you have just started your journey, you must have come across this. Let me first take you to the past and introduce some other topics to build the intuition.

If you want to do crazy things with text including sentiment analysis, question-answering, text similarity, topic modeling, and what not, you are going to feed words/sentences/documents to the model.

But! the machine doesn't understand the text right? So you would convert it into a machine-readable form. The word vector is the thing you are looking for.

Word Vector is the representation of words in the vector form. Well, you can just encode the words and match the encoded vectors but that won't solve the purpose. Imagine having 1 Million words in the vocabulary and then encoding all those words to pass to the network. Why not learn to encode similarities to vectors?

word vector relationship

Word representations in vector space (2-D)

Word Vectors and Word2Vec

Word representations in vector space (3-D)

Crazy Computation! Not Possible(Even if it is why would I kill my machine with heavy math and 0's?)

What before Word2Vec?

Before Word2Vec N-grams were used to capture the meaning of the word given n accompanying words before context word but N-grams cannot capture the context.

The overall probability would be P(current_word|n_words).

To enhance similarities we need embeddings. Don’t just cluster words, seek representation that can capture the degree of similarity.

Word Vectors and Word2Vec

Context includes words in a fixed window around the word in a text

What's the measurement of quality in Word2Vec?

Its the similarity of words in a task.

Advantages:

Computationally efficient
Accurate
Performs well on finding the semantic and syntactic similarity

Okay, but how does these **WORD VECTORS WORK**?

Each word is encoded in a vector(as a number represented in multiple-dimension) to be matched with vectors of words that appear in a similar context. Hence a dense vector is formed for the text. The vectors are based on the features.

Word Vectors and Word2Vec

Word vectors are sometimes called word embeddings or word representations. They are a distributed representation

Word Vectors and Word2Vec

See how related words are around ‘expect’, it’s because those word vectors are similar

Word Embedding gives the meanings of words with the help of vectors. Subtractions of vectors of some words gives rise to meaningful relationship.

For Example:
King-Man+Woman=Queen
(Vking-VMan+VWoman=VQueen, where V=Vector)

By using word embedding we use a fully connected layer and its weights are called embedding weights(whose values are learned during training the model just like the other layers like Dense, CNN etc are learned).

And this embedding weight matrix turns out to be a lookup table.

Without word embedding, you’d encode the text and then multiply that with the hidden layer. For 200 words you will one hot encode and then multiply with the hidden layer. That will give you mostly 0’s in the output. How inefficient is that?

Word Embedding comes to rescue. Since multiplication of any One hot encoded vector with weight matrix is the corresponding row itself, we just assign unique integers to word and then take the corresponding row from the lookup table. Thus there's no need to multiply.

Word Vectors and Word2Vec

Multiplying any OHE vector gives just the corresponding row.

So an embedding layer is a layer having embedding weights that are learned during training.

Word Vectors and Word2Vec

From the Udacity's deep learning nano degree

Say there is a word heart whose index is 958. Now the 958th row of the embedding matrix will be the output and will be moved forward to the hidden layer.

These weights in the lookup table are just vector representations of words. Columns in these matrices represent the embedding dimension. Any word having the same meaning has the same representation.

Finally, let's discuss Word2Vec

Word2Vec model uses this concept of embedding and lookup. Based on the word of interest and context it understands and learns the weights to prepare the matrix. This prepared matrix is embedding which understands the similarity in words.

The words in a similar context have similar representation. Word2Vec find these similarities and relationships between them during training and hence prepare a master vector representation called embedding.

Similar words are near each other and dissimilar are far in the representation.

Word Vectors and Word2Vec

This is all about Word Vectors, Embeddings, and Word2Vec. Drop your questions in the comments.

External References:

C TUTORIAL

C PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

C++ TUTORIAL

C++ PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

PYTHON TUTORIAL

PYTHON HOW TOS

INTERVIEW TESTS

EXECUTE CODE

JAVA TUTORIAL

JAVA CODE EXAMPLES

SPRING TUTORIAL

MORE IN JAVA

COMPUTER ARCHITECTURE

COMPUTER NETWORK

OPERATING SYSTEM

DBMS & SQL

PL/SQL

MongoDB

EXECUTE SQL

ANDROID DEVELOPMENT

GO LANGUAGE

LINUX

DOCKER

HTML TAGS (A to Z)

CSS REFERENCES

SASS/SCSS

KOTLIN

GAME DEVELOPMENT

PHP

GIT GUIDE

JAVASCRIPT

ADVANCED DSA

Introduction to Word Vectors and Word2Vec

Table of Contents

What is Word2Vec?

What before Word2Vec?

What's the measurement of quality in Word2Vec?

Finally, let's discuss Word2Vec

You may also like:

IF YOU LIKE IT, THEN SHARE IT

RELATED POSTS