How to Calculate Cosine Similarity in Python?

Introduction

Cosine similarity is a popular metric used to measure the similarity between two vectors in a multi-dimensional space. It is widely employed in various fields, such as natural language processing, information retrieval, and recommendation systems. Cosine similarity measures the cosine of the angle between two vectors, and it ranges from -1 (completely dissimilar) to 1 (completely similar). A value close to 1 indicates a high similarity between the vectors.

In this article, we will explore how to calculate cosine similarity in Python using different methods and libraries, such as NumPy, scikit-learn and SciPy. We will walk through the steps to compute cosine similarity for both dense and sparse vectors.

1. Using NumPy

NumPy is a powerful library for numerical computations in Python. To calculate cosine similarity between two vectors using NumPy, follow these steps:

Step 1: Import the NumPy library

import numpy as np

Step 2: Define two vectors as NumPy arrays

vector1 = np.array([1, 2, 3])

vector2 = np.array([4, 5, 6])

Step 3: Compute the dot product of the two vectors

dot_product = np.dot(vector1, vector2)

Step 4: Calculate the magnitudes (norms) of each vector

norm_vector1 = np.linalg.norm(vector1)
norm_vector2 = np.linalg.norm(vector2)

Step 5: Compute the cosine similarity using the dot product and vector norm

cosine_similarity = dot_product / (norm_vector1 * norm_vector2)

Step 6: Print the cosine similarity

print("Cosine Similarity:", cosine_similarity)

Output

Cosine Similarity: 0.9746318461970762

Complete Example

import numpy as np

vector1 = np.array([1, 2, 3])

vector2 = np.array([4, 5, 6])

dot_product = np.dot(vector1, vector2)

norm_vector1 = np.linalg.norm(vector1)

norm_vector2 = np.linalg.norm(vector2)

cosine_similarity = dot_product / (norm_vector1 * norm_vector2)

print("Cosine Similarity:", cosine_similarity)

 Output

Cosine Similarity: 0.9746318461970762

2. Using scikit-learn

Scikit-learn is a popular machine learning library that provides efficient implementations for various similarity metrics, including cosine similarity. To calculate cosine similarity using scikit-learn, follow these steps:

Step 1: Import the necessary module from scikit-learn

from sklearn.metrics.pairwise import cosine_similarity

Step 2: Define two vectors as NumPy arrays (same as before)

vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])

Step 3: Reshape the vectors into 2D arrays (required by scikit-learn)

vector1 = vector1.reshape(1, -1)
vector2 = vector2.reshape(1, -1)

Step 4: Calculate the cosine similarity using the 'cosine_similarity' function

cosine_similarity_score = cosine_similarity(vector1, vector2)

Step 5: Print the cosine similarity

print("Cosine Similarity:", cosine_similarity_score[0][0])

Output

Cosine Similarity: 0.9746318461970762

Complete Example

from sklearn.metrics.pairwise import cosine_similarity

vector1 = np.array([1, 2, 3])

vector2 = np.array([4, 5, 6])

vector1 = vector1.reshape(1, -1)

vector2 = vector2.reshape(1, -1)

cosine_similarity_score = cosine_similarity(vector1, vector2)

print("Cosine Similarity:", cosine_similarity_score[0][0])

Output

Cosine Similarity: 0.9746318461970762

3. Using SciPy

SciPy is another powerful library for scientific and technical computing in Python. It includes a function to compute cosine similarity for dense vectors. To use SciPy for calculating cosine similarity, follow these steps:

Step 1: Import the necessary function from SciPy

from scipy.spatial.distance import cosine

Step 2: Define two vectors as NumPy arrays (same as before)

vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])

Step 3: Calculate the cosine similarity using the 'cosine' function

cosine_similarity_score = 1 - cosine(vector1, vector2)

Step 4: Print the cosine similarity

print("Cosine Similarity:", cosine_similarity_score)

Output

Cosine Similarity: 0.9746318461970761

Complete Example

from scipy.spatial.distance import cosine

vector1 = np.array([1, 2, 3])

vector2 = np.array([4, 5, 6])

cosine_similarity_score = 1 - cosine(vector1, vector2)

print("Cosine Similarity:", cosine_similarity_score)

Output

Cosine Similarity: 0.9746318461970761

Conclusion

In this article, we learned how to calculate cosine similarity in Python using various methods and libraries. We explored implementations using NumPy, scikit-learn, and SciPy, both for dense and sparse vectors. Cosine similarity is a powerful tool for measuring similarity between vectors and finds widespread application in various fields, especially in natural language processing and recommendation systems. Whether you are working with dense or sparse data, Python offers efficient libraries to compute cosine similarity and utilize it in your projects effectively.


Advertisements

Useful Resources:

Comments

Post a Comment