# the unsupervised learning algorithm which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest centroid is

### Mohammed

Guys, does anyone know the answer?

get the unsupervised learning algorithm which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest centroid is from screen.

## k

Play your part! Take part in an open contest to find the sound of all human knowledge – a sound logo for all Wikimedia projects.

[Help with translations!]

## -means clustering

From Wikipedia, the free encyclopedia

Jump to navigation Jump to search

Not to be confused with K-nearest neighbors algorithm.

**-means clustering**is a method of vector quantization, originally from signal processing, that aims to partition observations into clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. -means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

The problem is computationally difficult (NP-hard); however, efficient heuristic algorithms converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both and . They both use cluster centers to model the data; however, -means clustering tends to find clusters of comparable spatial extent, while the Gaussian mixture model allows clusters to have different shapes.

The unsupervised k-means algorithm has a loose relationship to the -nearest neighbor classifier, a popular supervised machine learning technique for classification that is often confused with -means due to the name. Applying the 1-nearest neighbor classifier to the cluster centers obtained by -means classifies new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm.

## Contents

1 Description 2 History 3 Algorithms

3.1 Standard algorithm (naive k-means)

3.1.1 Initialization methods

3.2 Complexity 3.3 Variations

3.4 Hartigan–Wong method

3.5 Global optimization and metaheuristics

4 Discussion 5 Applications

5.1 Vector quantization

5.2 Cluster analysis

5.3 Feature learning

6 Relation to other algorithms

6.1 Gaussian mixture model

6.2 K-SVD

6.3 Principal component analysis

6.4 Mean shift clustering

6.5 Independent component analysis

6.6 Bilateral filtering

7 Similar problems

8 Software implementations

8.1 Free Software/Open Source

8.2 Proprietary 9 See also 10 References

## Description[edit]

Given a set of observations (**x**1, **x**2, ..., **x**), where each observation is a -dimensional real vector, -means clustering aims to partition the observations into (≤ ) sets **S** = {1, 2, ..., } so as to minimize the within-cluster sum of squares (WCSS) (i.e. variance). Formally, the objective is to find:

{\displaystyle {\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}\sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|^{2}={\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}|S_{i}|\operatorname {Var} S_{i}}

where is the mean of points in . This is equivalent to minimizing the pairwise squared deviations of points in the same cluster:

{\displaystyle {\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}\,{\frac {1}{|S_{i}|}}\,\sum _{\mathbf {x} ,\mathbf {y} \in S_{i}}\left\|\mathbf {x} -\mathbf {y} \right\|^{2}}

The equivalence can be deduced from identity

{\displaystyle |S_{i}|\sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|^{2}=\sum _{\mathbf {x} \neq \mathbf {y} \in S_{i}}\left\|\mathbf {x} -\mathbf {y} \right\|^{2}}

. Since the total variance is constant, this is equivalent to maximizing the sum of squared deviations between points in clusters (between-cluster sum of squares, BCSS),.[1] This deterministic relationship is also related to the law of total variance in probability theory.

## History[edit]

The term "-means" was first used by James MacQueen in 1967,[2] though the idea goes back to Hugo Steinhaus in 1956.[3] The standard algorithm was first proposed by Stuart Lloyd of Bell Labs in 1957 as a technique for pulse-code modulation, although it was not published as a journal article until 1982.[4] In 1965, Edward W. Forgy published essentially the same method, which is why it is sometimes referred to as the Lloyd–Forgy algorithm.[5]

## Algorithms[edit]

### Standard algorithm (naive k-means)[edit]

Convergence of -means

The most common algorithm uses an iterative refinement technique. Due to its ubiquity, it is often called "the -means algorithm"; it is also referred to as Lloyd's algorithm, particularly in the computer science community. It is sometimes also referred to as "naïve -means", because there exist much faster alternatives.[6]

## K

K-means clustering is an unsupervised learning algorithm which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest centroid. The…

## K-Means Clustering for Image Classification

K-Means Clustering for Image Classification Yes! K-Means Clustering can be used for Image Classification of MNIST dataset. Here’s how.

Image by Gerd Altmann from Pixabay

K

-means clustering is an unsupervised learning algorithm which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest centroid. The algorithm aims to minimize the squared Euclidean distances between the observation and the centroid of cluster to which it belongs.

K-Means clustering is not limited to the consumer information and population scientist. It can be used for Imagery analysis as well. Here we would use K-Means clustering to classify images of MNIST dataset.

## Getting to know the data

The MNIST dataset is loaded from keras.

# Importing the dataset from keras

**from keras.datasets import mnist**

**(x_train, y_train), (x_test, y_test) = mnist.load_data()**

The MNIST dataset is a benchmark dataset in the machine learning community which consists of 28 x 28 pixel images of digits from 0 to 9. Let us get to know more about the dataset.

# Checking the ‘type’

print(type(x_train))print(type(x_test))

print(type(y_train))

**print(type(y_test))**

All of them are numpy arrays.

# Checking the shape

print(x_train.shape)print(x_test.shape)

print(y_train.shape)

**print(y_test.shape)**

The output is (60000,28,28), (10000,28,28), (60000,1), (10000,1). The ‘x_train’ and ‘x_test’ consist of 60000 and 10000 monochrome images respectively . The pixel size of each image is 28 x 28.

Every input image has an output which is the number displayed in the image. Thus ‘y_train’ and ‘y_test’ are of size (60000,1) and (10000,1).

**plt.gray()**# B/W Images

**plt.figure(figsize = (10,9))**# Adjusting figure size

# Displaying a grid of 3x3 images

for i in range(9):plt.subplot(3,3,i+1)

**plt.imshow(x_train[i])**

Initially all the images were of different pixel sizes. Through **Image Scaling** they have been reduced to a common pixel size of 28 x 28. The details of the image are lost due to reduction in pixel size and hence the images are blurred.

# Printing examples in 'y_train'

**for i in range(5):**

**print(y_train[i])**

The output is 5,0,4,1,9. The ‘y_train’ and ‘y_test’ are digits from 0 to 9 which indicate the number displayed in image.

## Preprocessing the Data

# Checking the minimum and maximum values of x_train

**print(x_train.min())**

**print(x_train.max())**

The minimum and maximum values are 0 and 255 respectively. In the RGB color space the red, green and blue use 8 bits each which have integer values from 0 to 255. So the total number of possible colors is 256*256*256 = 16777216. Sounds astonishing?

Since the dataset contains a range of values from 0 to 255, the dataset has to be normalized. **Data Normalization** is an important preprocessing step which ensures that each input parameter (pixel, in this case) has a similar data distribution. This fastens the process of covergence while training the model. Also Normalization makes sure no one particular parameter influences the output significantly.

Data normalization is done by subtracting the mean from each pixel and then dividing the result by the standard deviation. The distribution of such data would resemble a Gaussian curve centered at zero. For image inputs we need the pixel numbers to be positive. So the image input is divided by 255 so that input values are in range of [0,1].

# Data Normalization

# Conversion to float

**x_train = x_train.astype(‘float32’)**

**x_test = x_test.astype(‘float32’)**

# Normalization

**x_train = x_train/255.0**

**x_test = x_test/255.0**

Now we again check the minimum and maximum values of input.

# Checking the minimum and maximum values of x_train

**print(x_train.min())**

**print(x_train.max())**

The minimum and maximum values are 0 and 1 respectively. The input data is in range of [0,1].

The input data have to be converted from 3 dimensional format to 2 dimensional format to be fed into the K-Means Clustering algorithm. Hence the input data has to be reshaped.

# Reshaping input data

**X_train = x_train.reshape(len(x_train),-1)**

**X_test = x_test.reshape(len(x_test),-1)**

Now let us check the shape of ‘X_train’ and ‘X_test’.

# Checking the shape

**print(X_train.shape)**

**print(X_test.shape)**

The output is (60000,784) and (10000,784). (28 x 28 = 784)

Now that preprocessing of data is done, we move ahead to building of model with Mini Batch K-Means.

## Building the model

**Mini Batch K-Means**works similarly to the K-Means algorithm. The difference is that in mini-batch k-means the most computationally costly step is conducted on only a random sample of observations as opposed to all observations. This approach can significantly reduce the time required for the algorithm to find convergence with only a small cost in quality.

## Unsupervised Learning and Data Clustering

One good way to come to terms with a new problem is to work through identifying and defining the problem in the best possible way and learn a model that captures meaningful information from the data…

## Unsupervised Learning and Data Clustering

A task involving machine learning may not be linear, but it has a number of well known steps:

Problem definition.

Preparation of Data.

Learn an underlying model.

Improve the underlying model by quantitative and qualitative evaluations.

Present the model.

One good way to come to terms with a new problem is to work through identifying and defining the problem in the best possible way and learn a model that captures meaningful information from the data. While problems in Pattern Recognition and Machine Learning can be of various types, they can be broadly classified into three categories:

Supervised Learning:

The system is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs.

Unsupervised Learning:

No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).

Reinforcement Learning:

A system interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle or playing a game against an opponent). The system is provided feedback in terms of rewards and punishments as it navigates its problem space.

Between supervised and unsupervised learning is semi-supervised learning, where the teacher gives an incomplete training signal: a training set with some (often many) of the target outputs missing. We will focus on unsupervised learning and data clustering in this blog post.

**Unsupervised Learning**

In some pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine how the data is distributed in the space, known as density estimation. To put forward in simpler terms, for a n-sampled space x1 to xn, true class labels are not provided for each sample, hence known as learning without teacher.

**Issues with Unsupervised Learning:**

Unsupervised Learning is harder as compared to Supervised Learning tasks..

How do we know if results are meaningful since no answer labels are available?

Let the expert look at the results (external evaluation)

Define an objective function on clustering (internal evaluation)

**Why Unsupervised Learning is needed despite of these issues?**

Annotating large datasets is very costly and hence we can label only a few examples manually. Example: Speech Recognition

There may be cases where we don’t know how many/what classes is the data divided into. Example: Data Mining

We may want to use clustering to gain some insight into the structure of the data before designing a classifier.

Unsupervised Learning can be further classified into two categories:

Parametric Unsupervised Learning

In this case, we assume a parametric distribution of data. It assumes that sample data comes from a population that follows a probability distribution based on a fixed set of parameters. Theoretically, in a normal family of distributions, all members have the same shape and are parameterized by mean and standard deviation. That means if you know the mean and standard deviation, and that the distribution is normal, you know the probability of any future observation. Parametric Unsupervised Learning involves construction of Gaussian Mixture Models and using Expectation-Maximization algorithm to predict the class of the sample in question. This case is much harder than the standard supervised learning because there are no answer labels available and hence there is no correct measure of accuracy available to check the result.

Non-parametric Unsupervised Learning

In non-parameterized version of unsupervised learning, the data is grouped into clusters, where each cluster(hopefully) says something about categories and classes present in the data. This method is commonly used to model and analyze data with small sample sizes. Unlike parametric models, nonparametric models do not require the modeler to make any assumptions about the distribution of the population, and so are sometimes referred to as a distribution-free method.

**What is Clustering?**

Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

**Distance-based clustering.**

Given a set of points, with a notion of distance between points, grouping the points into some number of clusters, such that

internal (within the cluster) distances should be small i.e members of clusters are close/similar to each other.

external (intra-cluster) distances should be large i.e. members of different clusters are dissimilar.

Guys, does anyone know the answer?