NumPy Statistical Functions for AI & Data Science

Master NumPy's statistical functions like mean, median, min/max for AI, machine learning, and data analysis. Efficiently calculate key metrics on arrays.

Statistical Functions in NumPy

NumPy provides a robust collection of statistical functions essential for data analysis and scientific computing. These functions allow users to calculate various statistical metrics such as mean, median, variance, standard deviation, and more directly on NumPy arrays.


Minimum and Maximum Values

numpy.amin(a, axis=None, out=None, keepdims=<no value>)

Returns the minimum value from the array or along a specified axis.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the minimum. If None, the minimum of the flattened array is returned.
  • out: Alternative output array in which to place the result.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
a = np.array([[3, 7, 5], [8, 4, 3], [2, 4, 9]])

print(np.amin(a, axis=1))  # Minimum along rows: [3 3 2]
print(np.amin(a, axis=0))  # Minimum along columns: [2 4 3]
print(np.amin(a))          # Minimum of the entire array: 2

numpy.amax(a, axis=None, out=None, keepdims=<no value>)

Returns the maximum value from the array or along a specified axis.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the maximum. If None, the maximum of the flattened array is returned.
  • out: Alternative output array in which to place the result.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
a = np.array([[3, 7, 5], [8, 4, 3], [2, 4, 9]])

print(np.amax(a))          # Maximum of the entire array: 9
print(np.amax(a, axis=0))  # Maximum along columns: [8 7 9]

Range of Values

numpy.ptp(a, axis=None, out=None)

Calculates the range of values in the array, which is the difference between the maximum and minimum values.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the range. If None, the range of the flattened array is returned.
  • out: Alternative output array in which to place the result.

Example:

import numpy as np
a = np.array([[3, 7, 5], [8, 4, 3], [2, 4, 9]])

print(np.ptp(a))          # Range of the entire array: 7 (9 - 2)
print(np.ptp(a, axis=1))  # Range along rows: [4 5 7]
print(np.ptp(a, axis=0))  # Range along columns: [6 3 6]

Percentiles and Quantiles

numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=<no value>)

Computes the specified percentile(s) of the data. Percentiles are values below which a given percentage of observations in a group of observations falls.

Parameters:

  • a: Input array.
  • q: The percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.
  • axis: The axis along which to compute the percentile. If None, the percentile of the flattened array is returned.
  • out: Alternative output array in which to place the result.
  • overwrite_input: If True, the input array may be modified in place.
  • method: Specifies the interpolation method to use when the desired percentile lies between two data points.

Example:

import numpy as np
a = np.array([[30, 40, 70], [80, 20, 10], [50, 90, 60]])

print(np.percentile(a, 50))         # 50th percentile (median): 50.0
print(np.percentile(a, 50, axis=1)) # 50th percentile along rows: [40. 20. 60.]
print(np.percentile(a, 50, axis=0)) # 50th percentile along columns: [50. 40. 60.]
print(np.percentile(a, [25, 75]))   # 25th and 75th percentiles: [30. 70.]

numpy.quantile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=<no value>)

Computes the specified quantile(s) of the data. Quantiles are similar to percentiles but are typically expressed as fractions between 0 and 1.

Example:

import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(np.quantile(a, 0.5)) # 0.5 quantile (median): 3.0

Sum and Product

numpy.sum(a, axis=None, dtype=None, out=None, keepdims=<no value>)

Computes the sum of array elements over a given axis.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the sum. If None, the sum of the flattened array is returned.
  • dtype: The type of the returned array and of the accumulator in which the elements are summed.
  • out: Alternative output array in which to place the result.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
data = np.array([1, 2, 3, 4])
print(np.sum(data))  # Total sum: 10

numpy.prod(a, axis=None, dtype=None, out=None, keepdims=<no value>)

Computes the product of array elements over a given axis.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the product. If None, the product of the flattened array is returned.
  • dtype: The type of the returned array and of the accumulator in which the elements are summed.
  • out: Alternative output array in which to place the result.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
data = np.array([1, 2, 3, 4])
print(np.prod(data)) # Product of elements: 24

Median

numpy.median(a, axis=None, out=None, keepdims=<no value>)

Computes the median of the array or along a specified axis. The median is the value separating the higher half from the lower half of a data sample.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the median. If None, the median of the flattened array is returned.
  • out: Alternative output array in which to place the result.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
a = np.array([[30, 65, 70], [80, 95, 10], [50, 90, 60]])

print(np.median(a))         # Median of the entire array: 65.0
print(np.median(a, axis=0)) # Median along columns: [50. 90. 60.]
print(np.median(a, axis=1)) # Median along rows: [65. 80. 60.]

Mean (Average)

numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)

Calculates the arithmetic mean (average) of the array elements or along a specified axis.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the mean. If None, the mean of the flattened array is returned.
  • dtype: The type of the returned array and of the accumulator in which the elements are summed.
  • out: Alternative output array in which to place the result.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
a = np.array([[1, 2, 3], [3, 4, 5], [4, 5, 6]])

print(np.mean(a))         # Mean of the entire array: 3.666...
print(np.mean(a, axis=0)) # Mean along columns: [2.666... 3.666... 4.666...]
print(np.mean(a, axis=1)) # Mean along rows: [2. 4. 5.]

numpy.average(a, axis=None, weights=None, returned=False, keepdims=<no value>)

Calculates the weighted average of array elements. The weights parameter allows you to assign different importance to each element.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the weighted average. If None, the weighted average of the flattened array is returned.
  • weights: Array of weights. Must have the same shape as a or be broadcastable to it.
  • returned: If True, return a tuple (average, sum_of_weights).
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
a = np.array([1, 2, 3, 4])
wts = np.array([4, 3, 2, 1])

print(np.average(a))                    # Unweighted average: 2.5
print(np.average(a, weights=wts))       # Weighted average: 2.0
print(np.average(a, weights=wts, returned=True))  # (Weighted average, sum of weights): (2.0, 10.0)

Standard Deviation and Variance

numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)

Calculates the standard deviation, which measures the spread of values around the mean.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the standard deviation. If None, the standard deviation of the flattened array is returned.
  • dtype: The type of the returned array and of the accumulator in which the elements are summed.
  • out: Alternative output array in which to place the result.
  • ddof: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is 0.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
print(np.std([1, 2, 3, 4]))  # Standard deviation: 1.118...

numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)

Measures how far each number in the set is from the mean, and thus from every other number in the set. It is the square of the standard deviation.

Parameters:

  • a: Input array.
  • axis: The axis along which to compute the variance. If None, the variance of the flattened array is returned.
  • dtype: The type of the returned array and of the accumulator in which the elements are summed.
  • out: Alternative output array in which to place the result.
  • ddof: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is 0.
  • keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Example:

import numpy as np
print(np.var([1, 2, 3, 4]))  # Variance: 1.25

Correlation Coefficient

numpy.corrcoef(x, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)

Returns the Pearson product-moment correlation coefficients. The correlation coefficient measures the linear relationship between two datasets. A value of 1 means a perfect positive linear correlation, -1 means a perfect negative linear correlation, and 0 means no linear correlation.

Parameters:

  • x: An array of variables. Each row of x represents a variable, and each column a observation of that variable.
  • y: An array of variables. If None, the correlation matrix of x is returned.
  • rowvar: If True, then each row represents a variable, with observations in the columns. If False, then each column represents a variable, with observations in the rows.
  • bias: Default is False. If False, then the calculations are performed using N-1 in the denominator.
  • ddof: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is 0.
  • fweights: 1-D array of frequency weights.
  • aweights: 1-D array of observation weights.

Example:

import numpy as np
data1 = np.array([1, 2, 3, 4, 5])
data2 = np.array([5, 4, 3, 2, 1])

correlation_matrix = np.corrcoef(data1, data2)
print(correlation_matrix)
# [[ 1. -1.]
#  [-1.  1.]]

Summary Table of Common NumPy Statistical Functions

Sr. No.FunctionDescription
1numpy.amin()Minimum value in array or along axis.
2numpy.amax()Maximum value in array or along axis.
3numpy.nanmin()Minimum value, ignoring NaN (Not a Number) values.
4numpy.nanmax()Maximum value, ignoring NaN values.
5numpy.ptp()Range of values (max – min).
6numpy.percentile()Computes the q-th percentile of the data.
7numpy.nanpercentile()Computes percentile, ignoring NaN values.
8numpy.quantile()Computes the q-th quantile of the data.
9numpy.nanquantile()Computes quantile, ignoring NaN values.
10numpy.median()Computes the median of the data.
11numpy.average()Computes the weighted average of the data.
12numpy.mean()Computes the arithmetic mean (average) of the data.
13numpy.std()Computes the standard deviation of the data.
14numpy.var()Computes the variance of the data.
15numpy.nanmean()Computes the mean, ignoring NaN values.
16numpy.nanstd()Computes the standard deviation, ignoring NaN values.
17numpy.nanvar()Computes the variance, ignoring NaN values.
18numpy.corrcoef()Computes the Pearson correlation coefficient matrix.