CLUSTERING
The goal of clustering is to create homogeneous groups of data points from a diverse dataset. The points with the highest similarity score are then grouped together after the similarity is assessed using a metric such as Euclidean distance, Cosine similarity, Manhattan distance, etc.
Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters can be arbitrary. There are many algorithms that work well with detecting arbitrary shaped clusters.
For example, In the below given graph we can see that the clusters formed are not circular in shape.
K-Means clustering is one of the most popular and straightforward clustering algorithms in unsupervised machine learning. It partitions a dataset into K distinct, non-overlapping subsets (clusters) based on similarity, where K is a user-defined parameter. Here’s a detailed overview of K-Means clustering:
The K-Means algorithm works as follows:
- Initialization: Select K initial centroids randomly from the dataset. These centroids are the starting points for each cluster.
- Assignment: Assign each data point to the nearest centroid based on a distance metric (usually Euclidean distance). This forms K clusters.
- Update: Recalculate the centroid of each cluster by taking the mean of all data points assigned to that cluster.
- Convergence: Repeat the assignment and update steps until the centroids no longer change significantly, or a maximum number of iterations is reached.
Applications of Clustering in different fields:
- Marketing: It can be used to characterize & discover customer segments for marketing purposes.
- Biology: It can be used for classification among different species of plants and animals.
- Libraries: It is used in clustering different books on the basis of topics and information.
- Insurance: It is used to acknowledge the customers, their policies and identifying the frauds.
- City Planning: It is used to make groups of houses and to study their values based on their geographical locations and other factors present.
- Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous zones.
- Image Processing: Clustering can be used to group similar images together, classify images based on content, and identify patterns in image data.
Sorted and Understandable content...found it helpful !
ReplyDelete