Unsupervised Learning: Discover Hidden Patterns in Data

Explore unsupervised learning in AI & Machine Learning. Learn how algorithms like k-Means discover hidden structures in unlabeled data and group similar points.

4. Unsupervised Learning

Unsupervised learning is a type of machine learning where algorithms learn patterns from data that has not been labeled, classified, or categorized. The goal is to find hidden structures or relationships within the data.

Key Concepts and Algorithms

Clustering Algorithms

Clustering is the task of dividing a dataset into groups (clusters) such that data points within the same cluster are more similar to each other than to those in other clusters.

  • k-Means Clustering:

    • Description: An iterative algorithm that partitions a dataset into $k$ distinct clusters. It works by minimizing the variance within each cluster.
    • Process:
      1. Initialize $k$ centroids randomly.
      2. Assign each data point to the nearest centroid.
      3. Recalculate the centroids based on the mean of the assigned data points.
      4. Repeat steps 2 and 3 until the centroids no longer change significantly or a maximum number of iterations is reached.
    • Use Cases: Customer segmentation, document analysis, image compression.
    • Example: Grouping customers based on their purchasing behavior.
  • Hierarchical Clustering:

    • Description: Builds a hierarchy of clusters. It can be agglomerative (bottom-up) or divisive (top-down). Agglomerative starts with each data point as its own cluster and merges them iteratively.
    • Process (Agglomerative):
      1. Start with each data point as a single cluster.
      2. Merge the two closest clusters.
      3. Repeat step 2 until only one cluster remains.
      4. The result can be visualized as a dendrogram.
    • Use Cases: Biological classification (phylogenetic trees), social network analysis.
    • Example: Building a hierarchy of related topics from a collection of articles.
  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise):

    • Description: Groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions. It is robust to noise and can find arbitrarily shaped clusters.
    • Key Parameters:
      • eps (epsilon): The maximum distance between two samples for one to be considered as in the neighborhood of the other.
      • min_samples: The number of samples in a neighborhood for a point to be considered as a core point.
    • Use Cases: Anomaly detection, spatial data analysis, identifying clusters of varying shapes.
    • Example: Identifying clusters of stars in astronomical data or detecting fraudulent transactions based on spatial patterns.

Dimensionality Reduction

Dimensionality reduction is the process of reducing the number of features (variables) in a dataset while retaining as much of the important information as possible. This is useful for simplifying models, speeding up training, and visualizing high-dimensional data.

  • Principal Component Analysis (PCA):

    • Description: A technique that transforms a dataset into a new set of uncorrelated variables called principal components. The first principal component captures the most variance, the second captures the second most variance, and so on.
    • Process:
      1. Standardize the data (mean = 0, variance = 1).
      2. Compute the covariance matrix of the data.
      3. Compute the eigenvalues and eigenvectors of the covariance matrix.
      4. Sort eigenvectors by their corresponding eigenvalues in descending order.
      5. Select the top $k$ eigenvectors to form a projection matrix.
      6. Transform the original data onto the new subspace defined by the selected eigenvectors.
    • Use Cases: Data compression, noise reduction, feature extraction.
    • Example: Reducing the number of pixels in an image while preserving its essential features for faster processing.
  • t-SNE (t-distributed Stochastic Neighbor Embedding):

    • Description: A non-linear dimensionality reduction technique primarily used for visualizing high-dimensional data, especially in clusters. It converts high-dimensional Euclidean distances between data points into conditional probabilities representing similarities.
    • Process: It models the similarity of high-dimensional points as a Gaussian distribution and the similarity of low-dimensional points as a Student's t-distribution, minimizing the difference between these two distributions.
    • Use Cases: Visualizing complex datasets, exploring data structure, understanding relationships between data points.
    • Example: Visualizing clusters of handwritten digits from the MNIST dataset.

Association Rule Mining

Association rule mining is used to discover interesting relationships between variables in large datasets. It is commonly used in market basket analysis to find items that are frequently purchased together.

  • Description: Identifies "if-then" rules between items in a transactional database.
  • Key Metrics:
    • Support: The frequency of an itemset in the database.
    • Confidence: The probability of the consequent occurring given the antecedent.
    • Lift: Measures how much more likely the consequent is given the antecedent, compared to its baseline probability.
  • Algorithms: Apriori, Eclat, FP-Growth.
  • Use Cases: Market basket analysis, recommendation systems, web usage mining.
  • Example: Discovering that customers who buy bread also tend to buy butter. Rule: {Bread} => {Butter} with support and confidence values.

Autoencoders (Neural Networks)

Autoencoders are a type of artificial neural network used for unsupervised learning of efficient data codings. They are trained to reconstruct their input.

  • Description: An autoencoder consists of an encoder network that maps the input to a lower-dimensional latent space representation, and a decoder network that reconstructs the input from the latent space.
  • Architecture:
    • Encoder: Input layer -> Hidden layers (reducing dimensionality) -> Latent space (bottleneck).
    • Decoder: Latent space -> Hidden layers (increasing dimensionality) -> Output layer (reconstruction of input).
  • Use Cases: Dimensionality reduction, feature learning, anomaly detection, data denoising, generative modeling.
  • Example: Training an autoencoder on images to learn a compressed representation (denoising) or to generate new, similar images.