Pandas Unique Values: Count & Retrieve with nunique()

Master counting and retrieving unique elements in Pandas. Learn how nunique() efficiently handles duplicate data for AI/ML preprocessing and real-time analysis.

Handling Unique Values in Pandas

When working with real-time data, identifying unique values and managing duplicate entries is a crucial step in data preprocessing. Duplicates can arise from various sources, including data entry errors, repeated records, or the merging of different datasets. Pandas offers several efficient methods for counting and retrieving unique elements.

Pandas provides the following key methods for handling unique values:

  • nunique(): Counts the number of distinct values.
  • value_counts(): Returns the frequency of each unique value.
  • unique(): Retrieves the actual unique values.

1. Counting Unique Elements in a DataFrame with nunique()

The nunique() method is used to count the number of unique elements along a specified axis of a DataFrame or Series.

Syntax

DataFrame.nunique(axis=0, dropna=True)
Series.nunique(dropna=True)

Parameters

  • axis:
    • 0 (default): Counts unique values column-wise (across rows).
    • 1: Counts unique values row-wise (across columns).
  • dropna:
    • True (default): Excludes NaN (Not a Number) values from the count.
    • False: Includes NaN values in the count if they are present.

Example: Column-Wise Count

Let's create a sample DataFrame and see how nunique() works column-wise.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [4, 5, 6, 5, 4],
    'B': [4, 1, 1, 2, 1],
    'C': [7, 8, 9, 8, 7]
})

# Count unique values per column
print("Unique counts per column:")
print(df.nunique())

Output:

Unique counts per column:
A    3
B    2
C    3
dtype: int64

Explanation:

  • Column 'A' has 3 unique values (4, 5, 6).
  • Column 'B' has 2 unique values (4, 1, 2).
  • Column 'C' has 3 unique values (7, 8, 9).

Example: Row-Wise Count

To count unique values across each row, set axis=1.

# Count unique values per row
print("\nUnique counts per row:")
print(df.nunique(axis=1))

Output:

Unique counts per row:
0    3
1    3
2    3
3    3
4    3
dtype: int64

Explanation:

For this specific DataFrame, each row happens to have 3 unique values. If a row contained duplicate values across columns, the count for that row would be less than the number of columns.


2. Counting Value Frequencies with value_counts()

The value_counts() method is specifically used on a Pandas Series to return a Series containing counts of unique values. The resulting Series is sorted in descending order by default.

Example

This method is excellent for understanding the distribution and frequency of occurrences for each unique item within a single column.

# Get the frequency of each unique value in column 'B'
print("\nValue counts for column 'B':")
print(df['B'].value_counts())

Output:

Value counts for column 'B':
1    3
2    1
4    1
Name: B, dtype: int64

Explanation:

The output shows that the value 1 appears 3 times in column 'B', 2 appears once, and 4 appears once.

Parameters for value_counts()

  • normalize: False (default) returns counts, True returns relative frequencies.
  • sort: True (default) sorts by frequency, False does not sort.
  • ascending: False (default) sorts descending, True sorts ascending.
  • dropna: True (default) excludes NaN values, False includes them.

3. Retrieving Unique Values with unique()

The unique() method extracts the unique values from a Pandas Series and returns them as a NumPy array. It does not provide counts; instead, it returns the distinct elements themselves.

Example

# Retrieve unique values from column 'A'
print("\nUnique values in column 'A':")
print(pd.unique(df['A']))

Output:

Unique values in column 'A':
[4 5 6]

Explanation:

This output shows the actual unique values present in column 'A' in the order they first appear in the Series.


Conclusion

Effectively handling unique values is fundamental for data quality management and insightful data analysis. Pandas provides a flexible and powerful toolkit for this purpose. Whether you need a quick summary count of distinct items using nunique(), a detailed frequency analysis with value_counts(), or direct retrieval of the unique data points with unique(), Pandas equips you with the necessary methods to enhance data integrity and understanding.