Word embeddings need to have high dimensionality in order to capture sufficient variations in natural language, which makes them super hard to visualize. T-SNE, which stands for t-Distributed Stochastic Neighbor Embedding, is a dimensionality reduction technique that can map high dimensional vectors to a lower dimensional space. It’s kind of like PCA, Principle Component Analysis, but with one amazing property. When performing the transformation, it tries to maintain relative distances between objects, so that similar ones stay closer together while dissimilar objects stay further apart. This makes t-SNE a great choice for visualizing word embeddings. It effectively preserves the linear substructures and relationships that have been learned by the embedding model. If we look at the larger vector space, we can discover meaningful groups of related words. Sometimes, that takes a while to realize why certain clusters are formed, but most of the groupings are very intuitive. T-SNE also works on other kinds of data, such as images. Here, we see pictures from the Caltech 101 dataset organized into clusters that roughly correspond to class labels, including airplanes with blue sky being the common theme, sailboats of different shapes and sizes, and human faces. This is a very useful tool for better understanding the representation that a network learns and for identifying any bugs or other issues.