Dark Mode On/Off

Interactive Learning

C Language course

GO Lang course

Learn JavaScript

Learn HTML

Learn CSS

C Language

C Tutorial

C Programs (100+)

C Compiler

Execute C programs online.

C++ Language

C++ Tutorial

Standard Template Library

C++ Programs (100+)

C++ Compiler

Execute C++ programs online.

Python

Python Tutorial

Python Projects

Python Programs

Python How Tos

Numpy Module

Matplotlib Module

Tkinter Module

Network Programming with Python

Learn Web Scraping

Dimensionality Reduction in Machine Learning

Technology #python3-x#machine-learning

Generally, we can visualize things in 1D, 2D, and 3D, but real-life datasets contain hundreds and thousands of features resulting in a very large dimension. So it is really hard for us to visualize such situations. To solve this problem, we have a method called Dimensionality Reduction.

There are three fundamental techniques that will help us to summarize the information content of a dataset by transforming it onto a new feature subspace of lower dimensionality than the original one. Here, all the codes are written using Scikit-learn in Python.

We will cover the following topics:

Principal component analysis (PCA) for unsupervised data compression
Linear Discriminant Analysis (LDA) as a supervised dimensionality reduction technique for maximizing class separability

Unsupervised Dimensionality Reduction via Principal Component Analysis

Feature extraction makes new features from old ones to use less data. It keeps most of the important information and makes things faster and simpler. PCA is a way to do feature extraction. It finds patterns in data by looking at how features are related. It puts data in a new space with fewer dimensions but more variance. The new space has axes that are at right angles to each other and show the most variance, like in the picture below. Here, x1 and x2 are the original feature axes, and PC1 and PC2 are the principal components:

Before looking at the PCA algorithm for dimensionality reduction in more detail, let's summarize the approach in a few simple steps:

Standardize the d -dimensional dataset.
Construct the covariance matrix.
Decompose the covariance matrix into its eigenvectors and eigenvalues.
Select k eigenvectors that correspond to the k largest eigenvalues, where k is the dimensionality of the new feature subspace (k ? d).
Construct a projection matrix W from the “top” k eigenvectors.
Transform the d – dimensional input dataset X using the projection matrix W to obtain the new k – dimensional feature subspace.

Principal component analysis in `Scikit-learn`:

PCA is another one of Scikit-learn's transformer classes, where we first fit the model using the training data before we transform both the training data and the test data using the same model parameters. Now, let's use the PCA from Scikit-learn on the Wine training dataset, classify the transformed samples via logistic regression and visualize the decision regions via the plot_decision_region function.

Code for PCA:

import pandas as pd

url = “https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data”

# Load dataset into Pandas DataFrame
df = pd.read_csv(url, names=[“sepal length”, “sepal width”, “petal length”, “petal width”, “target”])

from sklearn.preprocessing import StandardScaler

features = [“sepal length”, “sepal width”, “petal length”, “petal width”]

# Separate out the features
x = df.loc[:, features].values

# Separate out the target
y = df.loc[:, [“target”]].values

# Standardize the features
x = StandardScaler().fit_transform(x)

from sklearn.decomposition import PCA

# Create a PCA object with two components
pca = PCA(n_components=2)

# Fit and transform the features to get the principal components
principal_components = pca.fit_transform(x)

# Create a DataFrame with the principal components and the target
principal_df = pd.DataFrame(data=principal_components, columns=[“principal component 1”, “principal component 2”]) final_df = pd.concat([principal_df, df[[“target”]]], axis=1)

# Plot the data using the principal components and the target
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(8, 8)) ax = fig.add_subplot(1, 1, 1) ax.set_xlabel(“Principal Component 1”, fontsize=15) ax.set_ylabel(“Principal Component 2”, fontsize=15) ax.set_title(“2 component PCA”, fontsize=20)

targets = [“Iris-setosa”, “Iris-versicolor”, “Iris-virginica”] colors = [“r”, “g”, “b”]

for target, color in zip(targets, colors): indices_to_keep = final_df[“target”] == target ax.scatter(final_df.loc[indices_to_keep, “principal component 1”], final_df.loc[indices_to_keep, “principal component 2”], c=color, s=50)

ax.legend(targets) ax.grid()

# Print the explained variance ratio of the PCA
print(f"The explained variance ratio is: {pca.explained_variance_ratio_}")

Supervised data compression via linear discriminant analysis:

LDA is a way to make new features that separate classes well. It is like PCA, which makes new features that have more variance. But LDA is supervised and PCA is not. LDA might seem better for classification than PCA. But sometimes PCA works better, for example, if there are few samples per class.

Code in sklearn:

import numpy as np

import pandas as pd

# loading the dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

dataset = pd.read_csv(url, names=names)

# data preprocessing

X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# feature scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

# performing LDA

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA(n_components=1)

X_train = lda.fit_transform(X_train, y_train)

X_test = lda.transform(X_test)

Final Thoughts :

In the case of uniformly distributed data, LDA almost always performs better than PCA. However, if the data is highly skewed (irregularly distributed) then it is advised to use PCA since LDA can be biased towards the majority class.

Finally, it is beneficial that PCA can be applied to labelled as well as unlabeled data, since it doesn't rely on the output labels. On the other hand, LDA requires output classes for finding linear discriminant and hence requires labelled data.

Want to learn coding?

Try our new interactive courses.

View All →