Visualizing training data is often essential to design a good Machine Learning model. However, generally feature dimensions are much more than three. So to get visual insight, dimensionality reduction techniques such as PCA [1] and t-SNE (t-Distributed Stochastic Neighbor Embedding) [2] are used. In this article, we will introduce t-SNE dimensionality reduction, visualizing data using t-SNE, TensorBoard for t-SNE, and PCA visualization.
People who will benefit most from this article are those who:
- Want to know how to use PCA and t-SNE in Scikit Learn [3]
- Want to understand the difference between PCA and t-SNE
- Want to understand t-SNE (t-distributed Stochastic Neighbor Embedding)
- Want to know the usage of t-SNE
- Want to understand the stochastic nature of t-SNE
- Want to visualize a high-dimensional CNN feature using TensorBoard’s t-SNE and PCA
- t-Distributed Stochastic Neighbor Embedding
- t-SNE vs. PCA
- t-SNE TensorBoard Feature Visualization Code Explanation
- t-SNE Visualization using tensorBoard
- Summary
T-distributed Stochastic Neighbor Embedding
What is t-SNE used for?
t distributed Stochastic Neighbor Embedding (t-SNE) is a technique to visualize higher-dimensional features in two or three-dimensional space. It was first introduced by Laurens van der Maaten [4] and the Godfather of Deep Learning, Geoffrey Hinton [5], in 2008.
How Does t-Distributed Stochastic Neighbor Embedding Work?
Internally it works as follows:
- In the higher dimension, a probability distribution over pairs of points is constructed in such a way that similar points (euclidean distance, cosine similarity, etc.) are assigned a high probability, and dissimilar points are assigned a lower probability.
- In the lower dimension, it tries to achieve similar probability distribution on the lower dimensional map of higher dimensional points.
- To achieve this, we must minimize Kullback–Leibler divergence [6] (KL divergence) between two distributions.
In short, t-SNE tries to preserve the relative positions of points in lower dimensional mapping. In other words, closer points should be closer and farther points should be further in the lower dimensional map.
Is t-SNE Stochastic?
T-SNE tries to achieve probability distribution in low dimensional space similar to higher dimensional space (original space); hence it is stochastic.
t-SNE vs. PCA
To explain it, we will use two-dimensional data points (higher dimensional data points) and map them to one-dimensional data points using t-SNE and PCA. Finally, we will see how t-SNE preserves the relative positions of data points where PCA is not.
import numpy as np
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import requests
from zipfile import ZipFile
import os
import tensorflow as tf
from PIL import Image
from tensorboard.plugins import projector
Let’s generate two-dimensional data points using three normal distributions.
np.random.seed(seed)
num_points_per_class = 50
# Class 1
mean1 = [0, 0]
cov = [[0.1, 0], [0, 0.1]]
X1 = np.random.multivariate_normal(mean1, cov, num_points_per_class)
# Class 2
mean2 = [10, 0]
X2 = np.random.multivariate_normal(mean2, cov, num_points_per_class)
# Class 3
mean3 = [5, 6]
X3 = np.random.multivariate_normal(mean3, cov, num_points_per_class)
These points are concatenated for t-SNE and PCA.
X = np.concatenate([X1, X2, X3], axis=0)
X.shape
Implement a function to rescale the data points in the 0-1 range.
def scale_to_01_range(x):
# compute the distribution range
value_range = (np.max(x) - np.min(x))
# move the distribution so that it starts from zero
# by extracting the minimal value from all its values
starts_from_zero = x - np.min(x)
# make the distribution fit [0; 1] by dividing by its range
return starts_from_zero / value_range
X[:, 0] = scale_to_01_range(X[:, 0])
X[:, 1] = scale_to_01_range(X[:, 1])
The following class VisualizeScatter
is defined to visualize the data points.
class VisualizeScatter:
def __init__(self, fig_size=(10, 8), xlabel='X', ylabel='Y', title=None,
size=10, num_classes=3):
plt.figure(figsize=fig_size)
plt.grid('true')
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
self.colors = ['red', 'green', 'blue']
self.num_classes = num_classes
self.size = size
def add_scatters(self, X):
x = X[:, 0]
if X.shape[1] == 2:
y = X[:, 1]
else:
y = np.zeros(len(x))
points_per_class = len(x) // self.num_classes
st = 0
end = points_per_class
for i in range(self.num_classes):
plt.scatter(x[st:end], y[st:end],
c=self.colors[i % len(self.colors)],
s=self.size)
st = end
end = end + points_per_class
@staticmethod
def show_plot():
plt.show()
vis = VisualizeScatter(fig_size=(10, 8), title="Original Rescaled 2-D Points")
vis.add_scatters(X)
vis.show_plot()
We can observe the three clusters of data points.
Another way to visualize these clusters is in 1-D projection. Use t-SNE to transform two-dimensional data points into one-dimensional data points. It can be done with sklearn.
Here, we have specified the perplexity hyperparameter. The chosen value is good for our dataset, the significance of which we will discuss later in the post.
perplexity = 25
X_embedded = TSNE(n_components=1,
perplexity=perplexity,
learning_rate='auto',
init='random',
random_state=seed).fit_transform(X)
tsne_vis = VisualizeScatter(fig_size=(10, 2),
title='t-SNE 1-D Projection (perplexity = {})'.format(perplexity))
tsne_vis.add_scatters(X_embedded)
tsne_vis.show_plot()
We can see that one-dimensional projection preserves the three clusters.
Now, let’s use PCA for one-dimensional projection.
pca = PCA(n_components=1)
X_reduced = pca.fit_transform(X)
pca_vis = VisualizeScatter(fig_size=(10, 2), title='PCA 1-D Projection')
pca_vis.add_scatters(X_reduced)
pca_vis.show_plot()
We can see that the two clusters overlap with each other. Therefore, it fails to preserve the relative positions. PCA is meant to maximize variance rather than maintain the relative position.
Effect of Smaller Perplexity
What if we use a smaller perplexity value? The following t-SNE visualization is performed with perplexity=2.
perplexity = 2
X_embedded = TSNE(n_components=1,
perplexity=perplexity,
learning_rate='auto',
init='random',
random_state=seed).fit_transform(X)
tsne_vis = VisualizeScatter(fig_size=(10, 2),
title='t-SNE 1-D Projection (perplexity = {})'.format(perplexity))
tsne_vis.add_scatters(X_embedded)
tsne_vis.add_scatters(X_embedded)
tsne_vis.show_plot()
You can observe that points from one cluster are mixed with others, and points are scattered all over for a very low perplexity.
Effect of Larger Perplexity
perplexity = 150
X_embedded = TSNE(n_components=1,
perplexity=perplexity,
learning_rate='auto',
init='random',
random_state=seed).fit_transform(X)
tsne_vis = VisualizeScatter(fig_size=(10, 2),
title='t-SNE 1-D Projection (perplexity = {})'.format(perplexity))
tsne_vis.add_scatters(X_embedded)
tsne_vis.add_scatters(X_embedded)
tsne_vis.show_plot()
Here the points from all clusters merge into two clusters, and all points in the respective cluster are very close to each other for a very high perplexity (note that the total number of points is 150).
The description of perplexity in SkLearn t-SNE API is the following:
The perplexity is related to the number of nearest neighbors used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. The perplexity must be less than the number of samples.
So when perplexity is very low, lower numbers of points are considered nearest neighbors, so other points are scattered, even those belonging to the same cluster.
And when perplexity is very high, higher numbers of points are considered the nearest neighbor, so almost all points are clubbed together.
There is an excellent explanation of How to use t-SNE effectively [7]?
Feature Visualization Using TensorBoard’s t-SNE
Next, let’s visualize high-dimensional CNN extracted features in two or three-dimensional feature maps.
We will use a pre-trained model trained on the image-net dataset without fully connected layers (only CNN layers) as a feature extractor.
Pre-trained model is popularly used for fine-tuning and transfer learning. Here is a beautiful article that explains fine-tuning.
We will use TensorBoard Projector to map higher-dimensional features to two or three-dimensional features. If you are interested in writing visualization code in Python, look at the article, t-SNE for Feature Visualization. A subset of the Animal-10 [8] dataset as sample data points will be used.
The prepared subset dataset has twenty images from every ten classes.
Utility Functions to Download and Unzip the Dataset
def download_file(url, save_name):
"""
"Download and save the file."
arguments:
url (str): URL path of the file.
save_name: (str): file path to save the downloaded file.
"""
file = requests.get(url)
open(save_name, 'wb').write(file.content)
print(f"Downloaded {save_name1}...")
return
def unzip(zip_file_path=None):
"""
"Unzip the file"
arguments:
zip_file_path (str): The zipped file path
"""
try:
with ZipFile(zip_file_path) as z:
z.extractall("./")
print(f"Extracted {zip_file_path}...\n")
except:
print("Invalid file")
return
if not os.path.exists('animal10'):
download_file(
'https://www.dropbox.com/sh/wyt8cvctpcvg10r/AAAuOf992j1vDf7S7oV1STW7a?dl=1',
'animal10.zip')
unzip('animal10.zip')
Downloaded animal10.zip...
Extracted animal10.zip...
Get Image Paths According to Class
The following function gets all the class-wise image paths for a given root directory.
def get_classwise_image_path(root_dir):
image_paths = dict()
classes = os.listdir(root_dir)
for cls in classes:
image_paths[cls] = []
class_dir = os.path.join(root_dir, cls)
images = os.listdir(class_dir)
for image_name in images:
img_path = os.path.join(class_dir, image_name)
image_paths[cls].append(img_path)
return image_paths
Use the above function to get all image paths.
IMG_ROOT_DIR = 'animal10'
image_paths_dict = get_classwise_image_path(IMG_ROOT_DIR)
Define image width and height for the pre-trained model inference.
IMG_WIDTH, IMG_HEIGHT = (224, 224)
Function to Resize Image
The following function reads and resizes an image. We need this function to add images in TensorBoard.
def load_and_resize_image(img_path, width, height):
img = Image.open(img_path).resize((width, height))
return img
Function to Visualize the Dataset
Before the model inference and use of t-SNE on extracted features, look into sample images from the Animal-10 dataset. Let us define a function for the same.
def show_class_sample(image_path_dic, fig_size=(15, 6)):
fig, axes = plt.subplots(
nrows=2,
ncols=5,
figsize=fig_size
)
list_axes = list(axes.flat)
classes = list(image_path_dic.keys())
for i, ax in enumerate(list_axes):
img = load_and_resize_image(image_path_dic[classes[i]][0],
IMG_WIDTH,
IMG_HEIGHT)
ax.imshow(img)
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.set_title(classes[i])
fig.suptitle("Animal-10 Dataset Samples", fontsize=15)
plt.show()
return
show_class_sample(image_paths_dict)
Function to Load a Pre-Trained Model
The following function loads a pre-trained model feature extractor trained on the ImageNet dataset. It takes image input size, model family, and model name. For example, for the model family resnet50, models are ResNet50, ResNet50V2, etc.
The model name is required to load the model, and the model family is needed to get pre-process function for the input image. It returns a pre-trained CNN feature extractor and the preprocess function.
def load_model_and_preprocess_func(input_shape, model_family, model_name):
# Models will be loaded wth pre-trainied `imagenet` weights.
model = getattr(tf.keras.applications, model_name)(input_shape=input_shape,
weights="imagenet",
include_top=False)
preprocess = getattr(tf.keras.applications, model_family).preprocess_input
return model, preprocess
The output shape of the CNN feature extractor is batch_size
, height
, width
, and num_chanels
. However, to add these features to TensorBoard’s Projector, we need to transfer it to a 1-D tensor. To achieve it, we can use GlobalAveragePooling2D to take an average across height and width. Then, each image results in a 1-D tensor of length equal to the number of channels.
The function defined below takes the CNN feature extractor, adds GlobalAveragePooling2D
on top of it and returns a Keras Model.
def get_feature_extractor(model):
inputs = model.inputs
x = model(inputs)
outputs = tf.keras.layers.GlobalAveragePooling2D()(x)
feat_ext = tf.keras.Model(inputs=inputs, outputs=outputs,
name="feature_extractor")
return feat_ext
IMAGE_SHAPE = (IMG_HEIGHT, IMG_WIDTH, 3)
MODEL_FAMILY = "resnet"
MODEL_NAME = "ResNet101"
model, preprocess= load_model_and_preprocess_func(IMAGE_SHAPE,
MODEL_FAMILY,
MODEL_NAME)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet101_weights_tf_dim_ordering_tf_kernels_notop.h5
171446536/171446536 [==============================] - 4s 0us/step
Get the feature extractor.
feat_ext_model = get_feature_extractor(model)
print(feat_ext_model.summary())
Model: "feature_extractor"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0
resnet101 (Functional) (None, 7, 7, 2048) 42658176
global_average_pooling2d (G (None, 2048) 0
lobalAveragePooling2D)
=================================================================
Total params: 42,658,176
Trainable params: 42,552,832
Non-trainable params: 105,344
_________________________________________________________________
None
Function to Extract Features
Let us implement a function that takes the input image, feature extractor (model), and pre-process function and returns the extracted feature.
def extract_features(input, model, preprocess):
# Pre-process the input image.
x = preprocess(input)
# Generate predictions.
preds = model.predict(x)
return preds[0]
Define a method to Reshape the Image
Define a function to load and reshape the image that can be used for inference.
def load_image_for_inference(image_path, img_shape):
# Load the image.
image = tf.io.read_file(image_path)
# Convert the image from bytes to an image tensor.
x = tf.image.decode_image(image, channels=img_shape[2])
# Resize image to the input shape required by the model.
x = tf.image.resize(x, (img_shape[0], img_shape[1]))
# Add a dimension for an image batch representation.
x = tf.expand_dims(x, axis=0)
return x
The following function takes all image paths, the feature extractor model, and the preprocess function and returns resized images, labels, and 1-D features.
def get_images_labels_features(image_paths_dict, feature_extractor, preprocess):
images = []
labels = []
features = []
for cls in image_paths_dict:
image_paths = image_paths_dict[cls]
for img_path in image_paths:
labels.append(cls)
img = load_and_resize_image(img_path, IMG_WIDTH, IMG_HEIGHT)
images.append(img)
img_for_infer = load_image_for_inference(img_path, IMAGE_SHAPE)
feature = extract_features(img_for_infer,
feature_extractor,
preprocess)
features.append(feature)
return images, labels, features
images, labels, features = get_images_labels_features(image_paths_dict, feat_ext_model, preprocess)
Suppose we are interested in visualizing images in 2-D or 3-D space instead of points in TensorBoard then, in that case, we need to add a sprite image (a collection of images put into a single image) in TensorBoard’s Projector.
Function to Create a Sprite Image
def create_sprite_image(pil_images, save_path):
# Assuming all images have the same width and height
img_width, img_height = pil_images[0].size
# create a master square images
row_coln_count = int(np.ceil(np.sqrt(len(pil_images))))
master_img_width = img_width * row_coln_count
master_img_height = img_height * row_coln_count
master_image = Image.new(
mode = 'RGBA',
size = (master_img_width, master_img_height),
color = (0, 0, 0, 0)
)
for i, img in enumerate(pil_images):
div, mod = divmod(i, row_coln_count)
w_loc = img_width * mod
h_loc = img_height * div
master_image.paste(img, (w_loc, h_loc))
master_image.convert('RGB').save(save_path, transparency=0)
return
At this point, we have all feature vectors, their labels, and corresponding images. Therefore, the sprite image can be created using these images. We must add all this to TensorBoard and update TensorBorad’s Projector config for visualization.
We will define a function that takes the log directory path, images, features, and labels, add them to TensorBoard, and updates Projector’s config.
It writes three files in the log directory:
- metadata.tsv: It has label information.
- features.tsv: It has feature vector information.
- sprite.jpg: It has image information.
Additionally, it writes a configuration file projector_config.pbtxt
. A sample configuration file is shown below.
embeddings {
metadata_path: "metadata.tsv"
sprite {
image_path: "sprite.jpg"
single_image_dim: 224
single_image_dim: 224
}
tensor_path: "features.tsv"
}
Function to Write Embeddings
def write_embedding(log_dir, pil_images, features, labels):
"""Writes embedding data and projector configuration to the logdir."""
metadata_filename = "metadata.tsv"
tensor_filename = "features.tsv"
sprite_image_filename = "sprite.jpg"
os.makedirs(log_dir, exist_ok=True)
with open(os.path.join(log_dir, metadata_filename), "w") as f:
for label in labels:
f.write("{}\n".format(label))
with open(os.path.join(log_dir, tensor_filename), "w") as f:
for tensor in features:
f.write("{}\n".format("\t".join(str(x) for x in tensor)))
sprite_image_path = os.path.join(log_dir, sprite_image_filename)
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
# Label info.
embedding.metadata_path = metadata_filename
# Features info.
embedding.tensor_path = tensor_filename
# Image info.
create_sprite_image(pil_images, sprite_image_path)
embedding.sprite.image_path = sprite_image_filename
# Specify the width and height of a single thumbnail.
img_width, img_height = pil_images[0].size
embedding.sprite.single_image_dim.extend([img_width, img_height])
# Create the configuration file.
projector.visualize_embeddings(log_dir, config)
return
Call the function to write embedding in the log directory.
LOG_DIR = os.path.join('logs', MODEL_NAME)
write_embedding(LOG_DIR, images, features, labels)
t-SNE Visualization using TensorBoard
%load_ext tensorboard
# %reload_ext tensorboard
%tensorboard --logdir {LOG_DIR}
Note that the default selection of TensorBoard will be PCA. You have to click on T-SNE to use t-SNE visualization.
Play with different t-SNE hyperparameters, e.g., perplexity and learning rate.
Summary
In this article,
- We have understood t-SNE and how it can be used more effectively.
- Feature visualization using t-SNE is more informative than PCA.
- A pre-trained model assigns different class images in different clusters that we have verified in a lower dimension.
Must Read Articles
- t-SNE for Feature Visualization
- Fine Tuning YOLOv7 on Custom Dataset
- CenterNet: Objects as Points – Anchor-Free Object Detection Explained
References
- PCA Wikipedia.
- Using t-SNE. Journal of Machine Learning Research 9:2579-2605, 2008.
- Scikit Learn Library.
- Laurens van der Maaten
- Godfather of Deep Learning, Geoffrey Hinton
- Kullback–Leibler divergence
- How to use t-SNE effectively
- Animal-10