AI Learning Guide Machine Learning - K-means Clustering Model Training and Prediction

2024-07-12

AI Learning Guide Machine Learning - K-means Clustering Model Training and Prediction

Artificial intelligence (AI) is one of the hottest topics in the world today, and it is changing the way we live and work. As an important branch of AI, machine learning has shown great potential and value in various fields. In machine learning, clustering is an important algorithm, and K-means clustering is one of the classic methods. In this blog, we will introduce the training and prediction process of the K-means clustering model in detail, and provide examples to illustrate how to use K-means clustering to perform cluster analysis on data.

K-means clustering model

K-means clustering is an unsupervised learning algorithm that divides the samples in the data set into K clusters so that the distance between samples in the same cluster is as small as possible and the distance between different clusters is as large as possible. In K-means clustering, each cluster is represented by a centroid, and clustering is performed iteratively by minimizing the distance between the samples in the cluster and the centroid.

The process of K-means clustering can be roughly divided into the following steps:

Select K initial center points
Assign each sample to the cluster with the closest center point
Update the centroid of each cluster
Repeat steps 2 and 3 until the cluster assignments do not change or the upper limit of the number of iterations is reached.

The prediction process of the K-means clustering model is to assign new samples to the cluster with the nearest centroid.

Training process of K-means clustering model

In this section, we will introduce the training process of the K-means clustering model in detail. For the sake of convenience, we will use Python's scikit-learn library for demonstration.

First, we need to import the relevant libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
1
2
3
4

Next, we generate some simulated data for demonstration:

X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
1

We can then use the K-means clustering model to train the data:

kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
1
2

Finally, we can visualize the results of the training, as well as the centroids of the clusters:

plt.scatter(X[:, 0], X[:, 1], s=50, c="lightblue", marker="o", edgecolor="black")
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=250, marker="*", c="red", edgecolor="black")
plt.show()
1
2
3

Through the above steps, we have completed the training process of the K-means clustering model and also obtained a visualization image of the training results.

Prediction process of K-means clustering model

Next, let's introduce the prediction process of the K-means clustering model. In the K-means clustering model, the prediction process is to assign new samples to the cluster with the nearest centroid.

First, we can use the trained K-means clustering model to predict new samples:

new_samples = np.array([[0, 0], [4, 4]])
predicted_labels = kmeans.predict(new_samples)
print(predicted_labels)
1
2
3

In the above code, we created two new samples[0, 0]and[4, 4]and usepredictmethod to predict it. Finally, we get the predicted cluster label of the new sample.

Example

In order to more intuitively understand the training and prediction process of the K-means clustering model, we will illustrate it with a specific example.

Suppose we have a dataset with three featuresX, we want to divide it into 3 clusters. First, we can train the data using a K-means clustering model:

kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
1
2

Next, we apply the trained model to new samples:

new_samples = np.array([[1, 1, 1], [2, 2, 2]])
predicted_labels = kmeans.predict(new_samples)
print(predicted_labels)
1
2
3

Through the above examples, we can clearly see the training and prediction process of the K-means clustering model.

Summarize

Through the introduction of this blog, we have learned in detail the training and prediction process of the K-means clustering model, and demonstrated through examples how to use Python's scikit-learn library to perform K-means clustering. K-means clustering is a simple and efficient clustering algorithm that can be applied to various fields, including data analysis, image processing, etc. I hope this blog will help you learn machine learning and artificial intelligence!

Technology Sharing