t-SNE: T-Distributed Stochastic Neighbor Embedding Explained

Visualizing training data is often essential to design a good Machine Learning model. However, generally feature dimensions are much more than three. So to get visual insight, dimensionality reduction techniques such as PCA [1] and t-SNE (t-Distributed Stochastic Neighbor Embedding) [2] are used. In this article, we will introduce t-SNE dimensionality reduction, visualizing data using t-SNE, TensorBoard for t-SNE, and PCA visualization.

People who will benefit most from this article are those who:

Want to know how to use PCA and t-SNE in Scikit Learn [3]
Want to understand the difference between PCA and t-SNE
Want to understand t-SNE (t-distributed Stochastic Neighbor Embedding)
Want to know the usage of t-SNE
Want to understand the stochastic nature of t-SNE
Want to visualize a high-dimensional CNN feature using TensorBoard’s t-SNE and PCA

t-Distributed Stochastic Neighbor Embedding
t-SNE vs. PCA
t-SNE TensorBoard Feature Visualization Code Explanation
t-SNE Visualization using tensorBoard
Summary

T-distributed Stochastic Neighbor Embedding

What is t-SNE used for?

t distributed Stochastic Neighbor Embedding (t-SNE) is a technique to visualize higher-dimensional features in two or three-dimensional space. It was first introduced by Laurens van der Maaten [4] and the Godfather of Deep Learning, Geoffrey Hinton [5], in 2008.

How Does t-Distributed Stochastic Neighbor Embedding Work?

Internally it works as follows:

In the higher dimension, a probability distribution over pairs of points is constructed in such a way that similar points (euclidean distance, cosine similarity, etc.) are assigned a high probability, and dissimilar points are assigned a lower probability.
In the lower dimension, it tries to achieve similar probability distribution on the lower dimensional map of higher dimensional points.
To achieve this, we must minimize Kullback–Leibler divergence [6] (KL divergence) between two distributions.

In short, t-SNE tries to preserve the relative positions of points in lower dimensional mapping. In other words, closer points should be closer and farther points should be further in the lower dimensional map.

Is t-SNE Stochastic?

T-SNE tries to achieve probability distribution in low dimensional space similar to higher dimensional space (original space); hence it is stochastic.

t-SNE vs. PCA

To explain it, we will use two-dimensional data points (higher dimensional data points) and map them to one-dimensional data points using t-SNE and PCA. Finally, we will see how t-SNE preserves the relative positions of data points where PCA is not.

import numpy as np
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

import requests
from zipfile import ZipFile
import os
import tensorflow as tf
from PIL import Image

from tensorboard.plugins import projector

Let’s generate two-dimensional data points using three normal distributions.

np.random.seed(seed)
num_points_per_class = 50
# Class 1
mean1 = [0, 0]
cov = [[0.1, 0], [0, 0.1]]
X1 = np.random.multivariate_normal(mean1, cov, num_points_per_class)

# Class 2
mean2 = [10, 0]
X2 = np.random.multivariate_normal(mean2, cov, num_points_per_class)

# Class 3
mean3 = [5, 6]
X3 = np.random.multivariate_normal(mean3, cov, num_points_per_class)

These points are concatenated for t-SNE and PCA.

X = np.concatenate([X1, X2, X3], axis=0)
X.shape

Implement a function to rescale the data points in the 0-1 range.

def scale_to_01_range(x):
    # compute the distribution range
    value_range = (np.max(x) - np.min(x))

    # move the distribution so that it starts from zero
    # by extracting the minimal value from all its values
    starts_from_zero = x - np.min(x)

    # make the distribution fit [0; 1] by dividing by its range
    return starts_from_zero / value_range

X[:, 0] = scale_to_01_range(X[:, 0])
X[:, 1] = scale_to_01_range(X[:, 1])

The following class VisualizeScatter is defined to visualize the data points.

class VisualizeScatter:
    def __init__(self, fig_size=(10, 8), xlabel='X', ylabel='Y', title=None, 
                 size=10, num_classes=3):
        plt.figure(figsize=fig_size)
        plt.grid('true')
        plt.title(title)
        plt.xlabel(xlabel)
        plt.ylabel(ylabel)
        self.colors = ['red', 'green', 'blue']
        self.num_classes = num_classes
        self.size = size

    def add_scatters(self, X):
        x = X[:, 0]
        if X.shape[1] == 2:
            y = X[:, 1]
        else:
            y = np.zeros(len(x))
        points_per_class = len(x) // self.num_classes
        st = 0
        end = points_per_class
        for i in range(self.num_classes):
            plt.scatter(x[st:end], y[st:end], 
                c=self.colors[i % len(self.colors)], 
                s=self.size)
            st = end
            end = end + points_per_class

    @staticmethod
    def show_plot():
        plt.show()

vis = VisualizeScatter(fig_size=(10, 8), title="Original Rescaled 2-D Points")
vis.add_scatters(X)
vis.show_plot()

t-SNE data point 2-D cluster visualization

We can observe the three clusters of data points.

Another way to visualize these clusters is in 1-D projection. Use t-SNE to transform two-dimensional data points into one-dimensional data points. It can be done with sklearn.

Here, we have specified the perplexity hyperparameter. The chosen value is good for our dataset, the significance of which we will discuss later in the post.

perplexity = 25
X_embedded = TSNE(n_components=1, 
    perplexity=perplexity, 
    learning_rate='auto', 
    init='random', 
    random_state=seed).fit_transform(X)
tsne_vis = VisualizeScatter(fig_size=(10, 2), 
    title='t-SNE 1-D Projection (perplexity = {})'.format(perplexity))
tsne_vis.add_scatters(X_embedded)
tsne_vis.show_plot()

t-SNE data point visualization 2-D to 1-D projection

We can see that one-dimensional projection preserves the three clusters.

Now, let’s use PCA for one-dimensional projection.

pca = PCA(n_components=1)
X_reduced = pca.fit_transform(X)
pca_vis = VisualizeScatter(fig_size=(10, 2), title='PCA 1-D Projection')
pca_vis.add_scatters(X_reduced)
pca_vis.show_plot()

PCA data point visualization 2-D to 1-D projection

We can see that the two clusters overlap with each other. Therefore, it fails to preserve the relative positions. PCA is meant to maximize variance rather than maintain the relative position.

Effect of Smaller Perplexity

What if we use a smaller perplexity value? The following t-SNE visualization is performed with perplexity=2.

perplexity = 2
X_embedded = TSNE(n_components=1, 
    perplexity=perplexity, 
    learning_rate='auto', 
    init='random', 
    random_state=seed).fit_transform(X)
tsne_vis = VisualizeScatter(fig_size=(10, 2), 
    title='t-SNE 1-D Projection (perplexity = {})'.format(perplexity))
tsne_vis.add_scatters(X_embedded)
tsne_vis.add_scatters(X_embedded)
tsne_vis.show_plot()

t-SNE data point visualization 2-D to 1-D projection low perplexity

You can observe that points from one cluster are mixed with others, and points are scattered all over for a very low perplexity.

Effect of Larger Perplexity

perplexity = 150
X_embedded = TSNE(n_components=1, 
    perplexity=perplexity, 
    learning_rate='auto', 
    init='random', 
    random_state=seed).fit_transform(X)

tsne_vis = VisualizeScatter(fig_size=(10, 2), 
    title='t-SNE 1-D Projection (perplexity = {})'.format(perplexity))
tsne_vis.add_scatters(X_embedded)
tsne_vis.add_scatters(X_embedded)
tsne_vis.show_plot()

t-SNE data point visualization 2-D to 1-D projection high perplexity

Here the points from all clusters merge into two clusters, and all points in the respective cluster are very close to each other for a very high perplexity (note that the total number of points is 150).

The description of perplexity in SkLearn t-SNE API is the following:

The perplexity is related to the number of nearest neighbors used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. The perplexity must be less than the number of samples.

So when perplexity is very low, lower numbers of points are considered nearest neighbors, so other points are scattered, even those belonging to the same cluster.

And when perplexity is very high, higher numbers of points are considered the nearest neighbor, so almost all points are clubbed together.

There is an excellent explanation of How to use t-SNE effectively [7]?

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

Feature Visualization Using TensorBoard’s t-SNE

Next, let’s visualize high-dimensional CNN extracted features in two or three-dimensional feature maps.

We will use a pre-trained model trained on the image-net dataset without fully connected layers (only CNN layers) as a feature extractor.

Pre-trained model is popularly used for fine-tuning and transfer learning. Here is a beautiful article that explains fine-tuning.

We will use TensorBoard Projector to map higher-dimensional features to two or three-dimensional features. If you are interested in writing visualization code in Python, look at the article, t-SNE for Feature Visualization. A subset of the Animal-10 [8] dataset as sample data points will be used.

The prepared subset dataset has twenty images from every ten classes.

Utility Functions to Download and Unzip the Dataset

def download_file(url, save_name):
    """
    "Download and save the file."

    arguments:
    url (str): URL path of the file.
    save_name: (str): file path to save the downloaded file.
    """
    file = requests.get(url)
    open(save_name, 'wb').write(file.content)
    print(f"Downloaded {save_name1}...")
    return

def unzip(zip_file_path=None):
    """
    "Unzip the file"

    arguments:
    zip_file_path (str): The zipped file path

    """
    try:
        with ZipFile(zip_file_path) as z:
            z.extractall("./")
            print(f"Extracted {zip_file_path}...\n")
    except:
        print("Invalid file")

    return

if not os.path.exists('animal10'):
    download_file(
        'https://www.dropbox.com/sh/wyt8cvctpcvg10r/AAAuOf992j1vDf7S7oV1STW7a?dl=1', 
        'animal10.zip')
    
    unzip('animal10.zip')

Downloaded animal10.zip... 
Extracted animal10.zip...

Get Image Paths According to Class

The following function gets all the class-wise image paths for a given root directory.

def get_classwise_image_path(root_dir):
    image_paths = dict()
    classes = os.listdir(root_dir)
    for cls in classes:
        image_paths[cls] = []
        class_dir = os.path.join(root_dir, cls)
        images = os.listdir(class_dir)
        for image_name in images:
            img_path = os.path.join(class_dir, image_name)
            image_paths[cls].append(img_path)
    return image_paths

Use the above function to get all image paths.

IMG_ROOT_DIR = 'animal10'
image_paths_dict = get_classwise_image_path(IMG_ROOT_DIR)

Define image width and height for the pre-trained model inference.

IMG_WIDTH, IMG_HEIGHT = (224, 224)

Function to Resize Image

The following function reads and resizes an image. We need this function to add images in TensorBoard.

def load_and_resize_image(img_path, width, height):
    img = Image.open(img_path).resize((width, height))
    return img

Function to Visualize the Dataset

Before the model inference and use of t-SNE on extracted features, look into sample images from the Animal-10 dataset. Let us define a function for the same.

def show_class_sample(image_path_dic, fig_size=(15, 6)):
    fig, axes = plt.subplots(
        nrows=2,
        ncols=5,
        figsize=fig_size
        )
    list_axes = list(axes.flat)
    classes = list(image_path_dic.keys())
    for i, ax in enumerate(list_axes): 
        img = load_and_resize_image(image_path_dic[classes[i]][0], 
            IMG_WIDTH, 
            IMG_HEIGHT)
        ax.imshow(img)
        ax.xaxis.set_visible(False)
        ax.yaxis.set_visible(False)
        ax.set_title(classes[i])
    fig.suptitle("Animal-10 Dataset Samples", fontsize=15)
    plt.show()
    return

show_class_sample(image_paths_dict)

t-SNE visualization animal dataset example

Function to Load a Pre-Trained Model

The following function loads a pre-trained model feature extractor trained on the ImageNet dataset. It takes image input size, model family, and model name. For example, for the model family resnet50, models are ResNet50, ResNet50V2, etc.

The model name is required to load the model, and the model family is needed to get pre-process function for the input image. It returns a pre-trained CNN feature extractor and the preprocess function.

def load_model_and_preprocess_func(input_shape, model_family, model_name):  
    
    # Models will be loaded wth pre-trainied `imagenet` weights.
    model = getattr(tf.keras.applications, model_name)(input_shape=input_shape, 
        weights="imagenet", 
        include_top=False)
    
    preprocess  = getattr(tf.keras.applications, model_family).preprocess_input
    return model, preprocess

The output shape of the CNN feature extractor is batch_size, height, width, and num_chanels. However, to add these features to TensorBoard’s Projector, we need to transfer it to a 1-D tensor. To achieve it, we can use GlobalAveragePooling2D to take an average across height and width. Then, each image results in a 1-D tensor of length equal to the number of channels.

The function defined below takes the CNN feature extractor, adds GlobalAveragePooling2D on top of it and returns a Keras Model.

def get_feature_extractor(model):
    inputs = model.inputs
    x = model(inputs)
    outputs = tf.keras.layers.GlobalAveragePooling2D()(x)
    feat_ext = tf.keras.Model(inputs=inputs, outputs=outputs, 
        name="feature_extractor")
    return feat_ext

IMAGE_SHAPE = (IMG_HEIGHT, IMG_WIDTH, 3)
MODEL_FAMILY = "resnet"
MODEL_NAME   = "ResNet101"
model, preprocess= load_model_and_preprocess_func(IMAGE_SHAPE, 
    MODEL_FAMILY, 
    MODEL_NAME)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet101_weights_tf_dim_ordering_tf_kernels_notop.h5
171446536/171446536 [==============================] - 4s 0us/step

Get the feature extractor.

feat_ext_model = get_feature_extractor(model)
print(feat_ext_model.summary())

Model: "feature_extractor"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 resnet101 (Functional)      (None, 7, 7, 2048)        42658176  
                                                                 
 global_average_pooling2d (G  (None, 2048)             0         
 lobalAveragePooling2D)                                          
                                                                 
=================================================================
Total params: 42,658,176
Trainable params: 42,552,832
Non-trainable params: 105,344
_________________________________________________________________
None

Function to Extract Features

Let us implement a function that takes the input image, feature extractor (model), and pre-process function and returns the extracted feature.

def extract_features(input, model, preprocess):
    
    # Pre-process the input image.
    x = preprocess(input)

    # Generate predictions.
    preds = model.predict(x)

    return preds[0]

Define a method to Reshape the Image

Define a function to load and reshape the image that can be used for inference.

def load_image_for_inference(image_path, img_shape):
    
    # Load the image.
    image = tf.io.read_file(image_path)
    
    # Convert the image from bytes to an image tensor.
    x = tf.image.decode_image(image, channels=img_shape[2])
    
    # Resize image to the input shape required by the model.
    x = tf.image.resize(x, (img_shape[0], img_shape[1]))
    
    # Add a dimension for an image batch representation.
    x = tf.expand_dims(x, axis=0)

    return x

The following function takes all image paths, the feature extractor model, and the preprocess function and returns resized images, labels, and 1-D features.

def get_images_labels_features(image_paths_dict, feature_extractor, preprocess):
    images = []
    labels = []
    features = []

    for cls in image_paths_dict:
        image_paths = image_paths_dict[cls]
        for img_path in image_paths:
            labels.append(cls)
            img = load_and_resize_image(img_path, IMG_WIDTH, IMG_HEIGHT)
            images.append(img)
            img_for_infer = load_image_for_inference(img_path, IMAGE_SHAPE)
            feature = extract_features(img_for_infer, 
                feature_extractor, 
                preprocess)
            features.append(feature)
    return images, labels, features

images, labels, features = get_images_labels_features(image_paths_dict, feat_ext_model, preprocess)

Suppose we are interested in visualizing images in 2-D or 3-D space instead of points in TensorBoard then, in that case, we need to add a sprite image (a collection of images put into a single image) in TensorBoard’s Projector.

Function to Create a Sprite Image

def create_sprite_image(pil_images, save_path):
    # Assuming all images have the same width and height
    img_width, img_height = pil_images[0].size

    # create a master square images
    row_coln_count = int(np.ceil(np.sqrt(len(pil_images))))
    master_img_width = img_width * row_coln_count
    master_img_height = img_height * row_coln_count

    master_image = Image.new(
        mode = 'RGBA',
        size = (master_img_width, master_img_height),
        color = (0, 0, 0, 0)
    )

    for i, img in enumerate(pil_images):
        div, mod = divmod(i, row_coln_count)
        w_loc = img_width * mod
        h_loc = img_height * div
        master_image.paste(img, (w_loc, h_loc))

    master_image.convert('RGB').save(save_path, transparency=0)
    return

At this point, we have all feature vectors, their labels, and corresponding images. Therefore, the sprite image can be created using these images. We must add all this to TensorBoard and update TensorBorad’s Projector config for visualization.

We will define a function that takes the log directory path, images, features, and labels, add them to TensorBoard, and updates Projector’s config.

It writes three files in the log directory:

metadata.tsv: It has label information.
features.tsv: It has feature vector information.
sprite.jpg: It has image information.

Additionally, it writes a configuration file projector_config.pbtxt. A sample configuration file is shown below.

embeddings {
           metadata_path: "metadata.tsv"
           sprite {
                  image_path: "sprite.jpg"
                  single_image_dim: 224
                  single_image_dim: 224
                  }
           tensor_path: "features.tsv"
           }

Function to Write Embeddings

def write_embedding(log_dir, pil_images, features, labels):
    """Writes embedding data and projector configuration to the logdir."""
    metadata_filename = "metadata.tsv"
    tensor_filename = "features.tsv"
    sprite_image_filename = "sprite.jpg"


    os.makedirs(log_dir, exist_ok=True)
    with open(os.path.join(log_dir, metadata_filename), "w") as f:
        for label in labels:
            f.write("{}\n".format(label))
    with open(os.path.join(log_dir, tensor_filename), "w") as f:
        for tensor in features:
            f.write("{}\n".format("\t".join(str(x) for x in tensor)))

    sprite_image_path = os.path.join(log_dir, sprite_image_filename)

    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    # Label info.
    embedding.metadata_path = metadata_filename
    # Features info.
    embedding.tensor_path = tensor_filename
    # Image info.
    create_sprite_image(pil_images, sprite_image_path)
    embedding.sprite.image_path = sprite_image_filename
    # Specify the width and height of a single thumbnail.
    img_width, img_height = pil_images[0].size
    embedding.sprite.single_image_dim.extend([img_width, img_height])
    # Create the configuration file.
    projector.visualize_embeddings(log_dir, config)
    
    return

Call the function to write embedding in the log directory.

LOG_DIR = os.path.join('logs', MODEL_NAME)
write_embedding(LOG_DIR, images, features, labels)

t-SNE Visualization using TensorBoard

%load_ext tensorboard
# %reload_ext tensorboard

%tensorboard --logdir {LOG_DIR}

t-SNE vizualization in Tensor Board

Note that the default selection of TensorBoard will be PCA. You have to click on T-SNE to use t-SNE visualization.

Play with different t-SNE hyperparameters, e.g., perplexity and learning rate.

100K+ Learners
3 Hours of Learning

Join Free OpenCV Bootcamp

15K+ Learners
3 Hours of Learning

Join Free TensorFlow Bootcamp

10K+ Learners
8 Hours of Learning

Join Free PyTorch Bootcamp

Summary

In this article,

We have understood t-SNE and how it can be used more effectively.
Feature visualization using t-SNE is more informative than PCA.
A pre-trained model assigns different class images in different clusters that we have verified in a lower dimension.

t-SNE: T-Distributed Stochastic Neighbor Embedding Explained

T-distributed Stochastic Neighbor Embedding

What is t-SNE used for?

How Does t-Distributed Stochastic Neighbor Embedding Work?

Is t-SNE Stochastic?

t-SNE vs. PCA

Effect of Smaller Perplexity

Effect of Larger Perplexity

Feature Visualization Using TensorBoard’s t-SNE

Utility Functions to Download and Unzip the Dataset

Get Image Paths According to Class

Function to Resize Image

Function to Visualize the Dataset

Function to Load a Pre-Trained Model

Function to Extract Features

Define a method to Reshape the Image

Function to Create a Sprite Image

Function to Write Embeddings

t-SNE Visualization using TensorBoard

Summary

Must Read Articles

References

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

T-distributed Stochastic Neighbor Embedding

What is t-SNE used for?

How Does t-Distributed Stochastic Neighbor Embedding Work?

Is t-SNE Stochastic?

t-SNE vs. PCA

Effect of Smaller Perplexity

Effect of Larger Perplexity

Feature Visualization Using TensorBoard’s t-SNE

Utility Functions to Download and Unzip the Dataset

Get Image Paths According to Class

Function to Resize Image

Function to Visualize the Dataset

Function to Load a Pre-Trained Model

Function to Extract Features

Define a method to Reshape the Image

Function to Create a Sprite Image

Function to Write Embeddings

t-SNE Visualization using TensorBoard

Summary

Must Read Articles

References

Subscribe & Download Code

Get Started with OpenCV

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?