# if the cosine distance between two vectors is zero, which of the following is true?

### Mohammed

Guys, does anyone know the answer?

get if the cosine distance between two vectors is zero, which of the following is true? from screen.

## machine learning

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity ) when one of the vectors is all zeros? v1 = [1, 1, 1, 1, 1] v2 = [0, 0, 0, 0, 0] When we calculate accord...

## Cosine similarity when one of vectors is all zeros

Ask Question

Asked 8 years, 1 month ago

Modified 5 years, 8 months ago

Viewed 13k times 15

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity )

when one of the vectors is all zeros?

v1 = [1, 1, 1, 1, 1]

v2 = [0, 0, 0, 0, 0]

When we calculate according to the classic formula we get division by zero:

Let d1 = 0 0 0 0 0 0

Let d2 = 1 1 1 1 1 1

Cosine Similarity (d1, d2) = dot(d1, d2) / ||d1|| ||d2||dot(d1, d2) = (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) = 0

||d1|| = sqrt((0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2) = 0

||d2|| = sqrt((1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2) = 2.44948974278

Cosine Similarity (d1, d2) = 0 / (0) * (2.44948974278)

= 0 / 0

I want to use this similarity measure in a clustering application. And I often will need to compare such vectors. Also [0, 0, 0, 0, 0] vs. [0, 0, 0, 0, 0]

Do you have any experience? Since this is a similarity (not a distance) measure should I use special case for

d( [1, 1, 1, 1, 1]; [0, 0, 0, 0, 0] ) = 0

d([0, 0, 0, 0, 0]; [0, 0, 0, 0, 0] ) = 1

what about

d([1, 1, 1, 0, 0]; [0, 0, 0, 0, 0] ) = ? etc.

machine-learningcluster-analysisdata-miningcosine-similarity

Share

edited Mar 20, 2017 at 9:31

Has QUIT--Anony-Mousse

74.6k12 12 gold badges 136 136 silver badges 191 191 bronze badges

asked Nov 2, 2014 at 13:13

Sebastian Widz 1,8744 4 gold badges 24 24 silver badges 41 41 bronze badges Add a comment

## 2 Answers

19

**If you have 0 vectors, cosine is the wrong similarity function for your application**.

Cosine distance is essentially equivalent to squared Euclidean distance on L_2 normalized data. I.e. you normalize every vector to unit length 1, then compute squared Euclidean distance.

The other benefit of Cosine is performance - computing it on very sparse, high-dimensional data is faster than Euclidean distance. It benefits from sparsity to the square, not just linear.

While you obviously can try to hack the similarity to be 0 when exactly one is zero, and maximal when they are identical, it won't really solve the underlying problems.

Don't choose the distance by what you can easily compute.

Instead, choose the distance such that the result has a meaning on your data. If the value is undefined, you don't have a meaning...

Sometimes, it may work to discard constant-0 data as meaningless data anyway (e.g. analyzing Twitter noise, and seeing a Tweet that is all numbers, no words). Sometimes it doesn't.

Share

answered Nov 2, 2014 at 19:34

Has QUIT--Anony-Mousse

74.6k12 12 gold badges 136 136 silver badges 191 191 bronze badges

What would a more appropriate similarity measure be in this case then? Hamming distance? –

iyop45

Sep 23, 2019 at 8:22

There is no context given. Euclidean distance could also be "more appropriate". –

Has QUIT--Anony-Mousse

Sep 24, 2019 at 5:52

Add a comment 3 It is undefined.

Think you have a vector C that is not zero in place your zero vector. Multiply it by epsilon > 0 and let run epsilon to zero. The result will depend on C, so the function is not continuous when one of the vectors is zero.

Share

edited Nov 2, 2014 at 17:31

answered Nov 2, 2014 at 13:27

Gyro Gearloose 1,0261 1 gold badge 7 7 silver badges 25 25 bronze badges Add a comment

Not the answer you're looking for? Browse other questions tagged machine-learningcluster-analysisdata-miningcosine-similarity or ask your own question.

The Overflow Blog

From Twitter Bootstrap to VP of Engineering at Patreon, a chat with Utkarsh...

Continuous delivery, meet continuous security

Featured on Meta

Inbox improvements are live

Help us identify new roles for community members

The [collapse] tag is being burninated

Help needed: a call for volunteer reviewers for the Staging Ground beta test

2022 Community Moderator Election Results

32 people chatting

### Related

2

Scipy, tf-idf and cosine similarity

3

Mathematical method for multiple document clustering by Cosine Similarity

1

Can the cosine similarity when using Locality Sensitive Hashing be -1?

1

Hierarchical Clustering with cosine similarity metric in fcluster package

1

Correctly interpreting Cosine Angular Distance Similarity & Euclidean Distance Similarity

4

What is the appropriate distance metric when clustering paragraph/doc2vec vectors?

9

Why use cosine similarity in Word2Vec when its trained using dot-product similarity

0

Keras BinaryCrossentropy loss gives NaN for angular distance between two vectors

0

Am i clustering users correctly by using sklearn's cosine similarity method and K-means algorithm?

### Hot Network Questions

How did Bill the Pony survive in "The Lord of the Rings?"

Feeding pasta dough through Kitchen aid roller

The contradictions between agile approach and the growth of individual team member

What do you call it when someone sings a melody and simultaneously plays the exact same melody?

## Cosine Similarity

## Cosine Similarity

Cosine Similarity Related terms:

Classification (Machine Learning)Vector Space ModelsClassificationInverse Document FrequencySupport Vector MachineEuclidean DistanceSemantic SimilaritySimilarity Score

View all Topics

## Getting to Know Your Data

Jiawei Han, ... Jian Pei, in Data Mining (Third Edition), 2012

### 2.4.7 Cosine Similarity

Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.

A document can be represented by thousands of attributes, each recording the frequency of a particular word (such as a keyword) or phrase in the document. Thus, each document is an object represented by what is called a term-frequency vector. For example, in Table 2.5, we see that Document1 contains five instances of the word team, while hockey occurs three times. The word coach is absent from the entire document, as indicated by a count value of 0. Such data can be highly asymmetric.

Table 2.5. Document Vector or Term-Frequency Vector

Document team coach hockey baseball soccer penalty score win loss season

Document1 5 0 3 0 2 0 0 2 0 0

Document2 3 0 2 0 1 1 0 1 0 1

Document3 0 7 0 2 1 0 0 3 0 0

Document4 0 1 0 0 1 2 2 0 3 0

Term-frequency vectors are typically very long and **sparse** (i.e., they have many 0 values). Applications using such structures include information retrieval, text document clustering, biological taxonomy, and gene feature mapping. The traditional distance measures that we have studied in this chapter do not work well for such sparse numeric data. For example, two term-frequency vectors may have many 0 values in common, meaning that the corresponding documents do not share many words, but this does not make them similar. We need a measure that will focus on the words that the two documents do have in common, and the occurrence frequency of such words. In other words, we need a measure for numeric data that ignores zero-matches.

**Cosine similarity**is a measure of similarity that can be used to compare documents or, say, give a ranking of documents with respect to a given vector of query words. Let

**x**and

**y**be two vectors for comparison. Using the cosine measure as a similarity function, we have

(2.23)

sim(x,y)=x⋅y||x||||y||,

where ||**x**|| is the Euclidean norm of vector

x=(x1,x2,…,xp) , defined as x12+x22+⋯+xp2

. Conceptually, it is the length of the vector. Similarly, ||**y**|| is the Euclidean norm of vector **y**. The measure computes the cosine of the angle between vectors **x** and **y**. A cosine value of 0 means that the two vectors are at 90 degrees to each other (orthogonal) and have no match. The closer the cosine value to 1, the smaller the angle and the greater the match between vectors. Note that because the cosine similarity measure does not obey all of the properties of Section 2.4.4 defining metric measures, it is referred to as a nonmetric measure.

Example 2.23

### Cosine similarity between two term-frequency vectors

Suppose that **x** and **y** are the first two term-frequency vectors in Table 2.5. That is,

x=(5,0,3,0,2,0,0,2,0,0)

and

y=(3,0,2,0,1,1,0,1,0,1)

. How similar are **x** and **y**? Using Eq. (2.23) to compute the cosine similarity between the two vectors, we get:

xt⋅y=5×3+0×0+3×2+0×0+2×1+0×1+0×0+2×1+0×0+0×1=25||x||=52+02+32+02+22+02+02+22+02+02=6.48||y||=32+02+22+02+12+12+02+12+02+12=4.12sim(x,y)=0.94

Therefore, if we were using the cosine similarity measure to compare these documents, they would be considered quite similar.

When attributes are binary-valued, the cosine similarity function can be interpreted in terms of shared features or attributes. Suppose an object **x** possesses the ith attribute if xi = 1. Then **x**t ⋅ **y** is the number of attributes possessed (i.e., shared) by both **x** and **y**, and |**x**||**y**| is the geometric mean of the number of attributes possessed by **x** and the number possessed by **y**. Thus, sim(**x**, **y**) is a measure of relative possession of common attributes.

A simple variation of cosine similarity for the preceding scenario is

(2.24)

which is the ratio of the number of attributes shared by **x** and **y** to the number of attributes possessed by **x** or **y**. This function, known as the **Tanimoto coefficient** or **Tanimoto distance**, is frequently used in information retrieval and biology taxonomy.

View chapter Purchase book

## Deep similarity learning for disease prediction

Vagisha Gupta, ... Neha Dohare, in Trends in Deep Learning Methodologies, 2021

### 3.4.2 Similarity learning

Similarity learning [22] deals with measuring the similarity between a pair of images and objects, and has application in tasks related to classification and regression. The aim is to learn the similarity function that finds an optimal relation between two relatable or similar objects in a quantitative way. Some applications of finding similarity measures are handwritten text recognition, face identification, search engines, signature verification, etc. Typically, similarity learning involves giving a pair of images as input and discovering how similar they are to each other. The output can be a nonnegative similarity score between 0 and 1, 1 if the two images are completely similar to each other, otherwise 0. Fig. 8.4 shows the calculation of the similarity score between two images. The images are embedded into vector representation using a deep learning architecture for learning the representation of features of the images followed by passing it to the similarity metric learning function, which measures the similarity score between two images, usually a value between 0 and 1.

## Euclidean vs. Cosine Distance

This post was written as a reply to a question asked in theData Mining course.

## Euclidean vs. Cosine Distance

March 25, 2017 | 10 minute read | Chris Emmery

This post was written as a reply to a question asked in the Data Mining course.

When to use the cosine similarity?

Let’s compare two different measures of distance in a vector space, and why either has its function under different circumstances. Starting off with quite a straight-forward example, we have our vector space X, that contains instances with animals. They are measured by their length, and weight. They have also been labelled by their stage of aging (young = 0, mid = 1, adult = 2). Here’s some random data:

import numpy as np

X = np.array([[6.6, 6.2, 1],

[9.7, 9.9, 2], [8.0, 8.3, 2], [6.3, 5.4, 1], [1.3, 2.7, 0], [2.3, 3.1, 0], [6.6, 6.0, 1], [6.5, 6.4, 1], [6.3, 5.8, 1], [9.5, 9.9, 2], [8.9, 8.9, 2], [8.7, 9.5, 2], [2.5, 3.8, 0], [2.0, 3.1, 0], [1.3, 1.3, 0]])

## Preparing the Data

We’ll first put our data in a DataFrame table format, and assign the correct labels per column:

import pandas as pd

df = pd.DataFrame(X, columns=['weight', 'length', 'label'])

df weight length label 0 6.6 6.2 1.0 1 9.7 9.9 2.0 2 8.0 8.3 2.0 3 6.3 5.4 1.0 4 1.3 2.7 0.0 5 2.3 3.1 0.0 6 6.6 6.0 1.0 7 6.5 6.4 1.0 8 6.3 5.8 1.0 9 9.5 9.9 2.0 10 8.9 8.9 2.0 11 8.7 9.5 2.0 12 2.5 3.8 0.0 13 2.0 3.1 0.0 14 1.3 1.3 0.0

Now the data can be plotted to visualize the three different groups. They are subsetted by their label, assigned a different colour and label, and by repeating this they form different layers in the scatter plot.

%matplotlib inline

ax = df[df['label'] == 0].plot.scatter(x='weight', y='length', c='blue', label='young')

ax = df[df['label'] == 1].plot.scatter(x='weight', y='length', c='orange', label='mid', ax=ax)

ax = df[df['label'] == 2].plot.scatter(x='weight', y='length', c='red', label='adult', ax=ax)

ax

Looking at the plot above, we can see that the three classes are pretty well distinguishable by these two features that we have. Say that we apply

k k

-NN to our data that will learn to classify new instances based on their distance to our known instances (and their labels). The algorithm needs a distance metric to determine which of the known instances are closest to the new one. Let’s try to choose between either euclidean or cosine for this example.

## Picking our Metric

Considering instance #0, #1, and #4 to be our known instances, we assume that we don’t know the label of #14. Plotting this will look as follows:

df2 = pd.DataFrame([df.iloc[0], df.iloc[1], df.iloc[4]], columns=['weight', 'length', 'label'])

df3 = pd.DataFrame([df.iloc[14]], columns=['weight', 'length', 'label'])

ax = df2[df2['label'] == 0].plot.scatter(x='weight', y='length', c='blue', label='young')

ax = df2[df2['label'] == 1].plot.scatter(x='weight', y='length', c='orange', label='mid', ax=ax)

ax = df2[df2['label'] == 2].plot.scatter(x='weight', y='length', c='red', label='adult', ax=ax)

ax = df3.plot.scatter(x='weight', y='length', c='gray', label='?', ax=ax)

ax

### Euclidean

Our euclidean distance function can be defined as follows:

∑ n i=1 ( x i − y i ) 2 − − − − − − − − − − − − √ ∑i=1n(xi−yi)2 Where x x and y y

are two vectors. Or:

def euclidean_distance(x, y):

return np.sqrt(np.sum((x - y) ** 2))

Let’s see this for all our vectors:

x0 = X[0][:-1] x1 = X[1][:-1] x4 = X[4][:-1] x14 = X[14][:-1]

print(" x0:", x0, "\n x1:", x1, "\n x4:", x4, "\nx14:", x14)

x0: [ 6.6 6.2] x1: [ 9.7 9.9] x4: [ 1.3 2.7] x14: [ 1.3 1.3]

Doing the calculations:

print(" x14 and x0:", euclidean_distance(x14, x0), "\n",

"x14 and x1:", euclidean_distance(x14, x1), "\n",

"x14 and x4:", euclidean_distance(x14, x4))

x14 and x0: 7.21803297305

x14 and x1: 12.0216471417

x14 and x4: 1.4

According to euclidean distance, instance #14 is closest to #4. Our 4th instance had the label:

X[4]

array([ 1.3, 2.7, 0. ])

0 = young, which is what we would visually also deem the correct label for this instance.

Now let’s see what happens when we use Cosine similarity.

### Cosine

Our cosine similarity function can be defined as follows:

x∙y x∙x √ y∙y √ x∙yx∙xy∙y Where x x and y y

are two vectors. Or:

def cosine_similarity(x, y):

return np.dot(x, y) / (np.sqrt(np.dot(x, x)) * np.sqrt(np.dot(y, y)))

Let’s see these calculations for all our vectors:

print(" x14 and x0:", cosine_similarity(x14, x0), "\n",

"x14 and x1:", cosine_similarity(x14, x1), "\n",

"x14 and x4:", cosine_similarity(x14, x4))

Guys, does anyone know the answer?