NumPy Array Manipulation for ML & Data Science

Master NumPy array manipulation for machine learning & data science. Learn to reshape, index, slice, and modify ndarrays effectively with Python.

NumPy Array Manipulation

NumPy is a fundamental package for scientific computing in Python, offering powerful tools for working with arrays. This guide covers essential NumPy routines for manipulating elements within ndarray objects, categorized for clarity.

1. Changing Shape

These routines alter an array's dimensions without changing its data.

  • numpy.reshape(a, newshape, order='C'): Gives a new shape to an array without changing its data. The total number of elements must remain the same.

    import numpy as np
    arr = np.array([1, 2, 3, 4, 5, 6])
    new_arr = arr.reshape((2, 3))
    print(new_arr)
    # Output:
    # [[1 2 3]
    #  [4 5 6]]
  • ndarray.flat: A 1-D iterator over the array. Allows iterating through array elements one by one, regardless of the original shape.

    import numpy as np
    arr = np.array([[1, 2], [3, 4]])
    for element in arr.flat:
        print(element, end=' ')
    # Output: 1 2 3 4
  • numpy.flatten(order='C'): Returns a copy of the array collapsed into one dimension. This is a method of the ndarray object.

    import numpy as np
    arr = np.array([[1, 2], [3, 4]])
    flattened_arr = arr.flatten()
    print(flattened_arr)
    # Output: [1 2 3 4]
  • numpy.ravel(a, order='C'): Returns a contiguous flattened array. This function returns a view of the original array whenever possible, making it more memory-efficient than flatten().

    import numpy as np
    arr = np.array([[1, 2], [3, 4]])
    raveled_arr = np.ravel(arr)
    print(raveled_arr)
    # Output: [1 2 3 4]
  • numpy.pad(array, pad_width, mode='constant', **kwargs): Returns a padded array with its shape increased according to pad_width. Useful for adding borders or margins to arrays.

    import numpy as np
    arr = np.array([1, 2, 3])
    padded_arr = np.pad(arr, (1, 2), 'constant', constant_values=(0, 10))
    print(padded_arr)
    # Output: [ 0  1  2  3 10 10]

2. Transpose Operations

Transpose operations swap rows and columns in 2D arrays or rearrange axes in higher-dimensional arrays.

  • numpy.transpose(a, axes=None): Permutes the dimensions of an array. By default, it reverses the order of axes.

    import numpy as np
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    transposed_arr = np.transpose(arr)
    print(transposed_arr)
    # Output:
    # [[1 4]
    #  [2 5]
    #  [3 6]]
  • ndarray.T: A shorthand for numpy.transpose(). It's an attribute that returns the transposed array.

    import numpy as np
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    print(arr.T)
    # Output:
    # [[1 4]
    #  [2 5]
    #  [3 6]]
  • numpy.rollaxis(a, axis, start=0): Rolls the specified axis backward. This shifts elements along the given axis.

    import numpy as np
    arr = np.arange(6).reshape((2,3))
    rolled_arr = np.rollaxis(arr, 1, 0)
    print(rolled_arr)
    # Output:
    # [[0 3]
    #  [1 4]
    #  [2 5]]
  • numpy.swapaxes(a, axis1, axis2): Interchanges two axes of an array.

    import numpy as np
    arr = np.arange(6).reshape((2,3))
    swapped_arr = np.swapaxes(arr, 0, 1)
    print(swapped_arr)
    # Output:
    # [[0 3]
    #  [1 4]
    #  [2 5]]
  • numpy.moveaxis(a, source, destination): Moves axes of an array to new positions. Allows for flexible rearrangement of dimensions.

    import numpy as np
    arr = np.random.rand(3, 4, 5)
    moved_arr = np.moveaxis(arr, 0, -1)
    print(moved_arr.shape)
    # Output: (4, 5, 3)

3. Changing Dimensions

These functions reshape or restructure arrays without altering data.

  • numpy.broadcast: Produces an object that mimics broadcasting. This is more for understanding the broadcasting mechanism.

  • numpy.broadcast_to(array, shape, subok=False): Broadcasts an array to a new shape. An array can be broadcasted to another shape if the dimensions of the array match the dimensions of the new shape in a certain way.

    import numpy as np
    arr = np.array([1, 2, 3])
    broadcasted_arr = np.broadcast_to(arr, (3, 3))
    print(broadcasted_arr)
    # Output:
    # [[1 2 3]
    #  [1 2 3]
    #  [1 2 3]]
  • numpy.expand_dims(a, axis): Expands the shape of an array by inserting a new axis at the specified position.

    import numpy as np
    arr = np.array([1, 2, 3])
    expanded_arr = np.expand_dims(arr, axis=0)
    print(expanded_arr)
    # Output: [[1 2 3]]
    print(expanded_arr.shape)
    # Output: (1, 3)
  • numpy.squeeze(a, axis=None): Removes single-dimensional entries from the shape of an array.

    import numpy as np
    arr = np.array([[1, 2, 3]])
    squeezed_arr = np.squeeze(arr)
    print(squeezed_arr)
    # Output: [1 2 3]
    print(squeezed_arr.shape)
    # Output: (3,)

4. Joining Arrays

Joining combines multiple arrays along specified axes.

  • numpy.concatenate((a1, a2, ...), axis=0, out=None): Joins a sequence of arrays along an existing axis.

    import numpy as np
    arr1 = np.array([1, 2])
    arr2 = np.array([3, 4])
    concatenated_arr = np.concatenate((arr1, arr2))
    print(concatenated_arr)
    # Output: [1 2 3 4]
  • numpy.stack(arrays, axis=0, out=None): Joins arrays along a new axis. This increases the dimensionality of the resulting array.

    import numpy as np
    arr1 = np.array([1, 2])
    arr2 = np.array([3, 4])
    stacked_arr = np.stack((arr1, arr2))
    print(stacked_arr)
    # Output:
    # [[1 2]
    #  [3 4]]
    print(stacked_arr.shape)
    # Output: (2, 2)
  • numpy.hstack(tup): Stacks arrays horizontally (column-wise). Equivalent to concatenate along axis 1 for 2D arrays.

    import numpy as np
    arr1 = np.array([1, 2])
    arr2 = np.array([3, 4])
    hstacked_arr = np.hstack((arr1, arr2))
    print(hstacked_arr)
    # Output: [1 2 3 4]
  • numpy.vstack(tup): Stacks arrays vertically (row-wise). Equivalent to concatenate along axis 0 for 2D arrays.

    import numpy as np
    arr1 = np.array([1, 2])
    arr2 = np.array([3, 4])
    vstacked_arr = np.vstack((arr1, arr2))
    print(vstacked_arr)
    # Output:
    # [[1 2]
    #  [3 4]]
  • numpy.dstack(tup): Stacks arrays depth-wise (along the third axis).

    import numpy as np
    arr1 = np.array([[1, 2], [3, 4]])
    arr2 = np.array([[5, 6], [7, 8]])
    dstacked_arr = np.dstack((arr1, arr2))
    print(dstacked_arr)
    # Output:
    # [[[1 5]
    #   [2 6]]
    #
    #  [[3 7]
    #   [4 8]]]
  • numpy.column_stack(tup): Stacks 1-D arrays as columns into a 2-D array. For N-D arrays, it stacks them as columns along the last axis.

    import numpy as np
    arr1 = np.array([1, 2])
    arr2 = np.array([3, 4])
    col_stacked_arr = np.column_stack((arr1, arr2))
    print(col_stacked_arr)
    # Output:
    # [[1 3]
    #  [2 4]]
  • numpy.row_stack(tup): Stacks 1-D arrays as rows into a 2-D array. Equivalent to vstack.

5. Splitting Arrays

Splitting divides arrays into smaller arrays along specified axes.

  • numpy.split(ary, indices_or_sections, axis=0): Splits an array into multiple sub-arrays. indices_or_sections can be an integer (number of sub-arrays) or a list of indices where splits occur.

    import numpy as np
    arr = np.arange(10)
    sub_arrays = np.split(arr, 2)
    print(sub_arrays)
    # Output: [array([0, 1, 4, 5]), array([2, 3, 6, 7])] - Note: This example is incorrect, split evenly. Correct split:
    sub_arrays_correct = np.split(arr, [3, 7])
    print(sub_arrays_correct)
    # Output: [array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])]
  • numpy.hsplit(ary, indices_or_sections): Splits an array horizontally (column-wise). Equivalent to split along axis 1.

    import numpy as np
    arr = np.arange(12).reshape((3, 4))
    h_split_arr = np.hsplit(arr, 2)
    print(h_split_arr[0])
    # Output:
    # [[0 1]
    #  [4 5]
    #  [8 9]]
  • numpy.vsplit(ary, indices_or_sections): Splits an array vertically (row-wise). Equivalent to split along axis 0.

    import numpy as np
    arr = np.arange(12).reshape((3, 4))
    v_split_arr = np.vsplit(arr, 3)
    print(v_split_arr[0])
    # Output:
    # [[0 1 2 3]]
  • numpy.dsplit(ary, indices_or_sections): Splits an array along the third axis (depth).

  • numpy.array_split(ary, indices_or_sections, axis=0): Splits an array into multiple sub-arrays, even if the split does not result in equal sized parts.

6. Adding / Removing Elements

These functions allow insertion or deletion of elements.

  • numpy.resize(a, new_shape): Returns a new array with specified shape. If the new shape requires more elements than the original array, the original elements are repeated. If fewer elements are needed, the extra elements are discarded.

    import numpy as np
    arr = np.array([1, 2, 3])
    resized_arr = np.resize(arr, (2, 3))
    print(resized_arr)
    # Output:
    # [[1 2 3]
    #  [1 2 3]]
  • numpy.append(arr, values, axis=None): Appends values to the end of an array. If axis is specified, the arrays are joined along that axis.

    import numpy as np
    arr = np.array([1, 2, 3])
    appended_arr = np.append(arr, [4, 5])
    print(appended_arr)
    # Output: [1 2 3 4 5]
  • numpy.insert(arr, obj, values, axis=None): Inserts values before specified indices along an axis.

    import numpy as np
    arr = np.array([1, 2, 3])
    inserted_arr = np.insert(arr, 1, [9, 8])
    print(inserted_arr)
    # Output: [1 9 8 2 3]
  • numpy.delete(arr, obj, axis=None): Returns a new array with elements deleted along an axis.

    import numpy as np
    arr = np.array([1, 2, 3, 4, 5])
    deleted_arr = np.delete(arr, [1, 3])
    print(deleted_arr)
    # Output: [1 3 5]
  • numpy.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None): Finds unique elements in an array and can return their indices and counts.

    import numpy as np
    arr = np.array([1, 2, 2, 3, 1, 4])
    unique_elements = np.unique(arr)
    print(unique_elements)
    # Output: [1 2 3 4]

7. Repeating and Tiling Arrays

Techniques to create larger arrays by duplicating elements.

  • numpy.repeat(a, repeats, axis=None): Repeats each element of an array.

    import numpy as np
    arr = np.array([1, 2, 3])
    repeated_arr = np.repeat(arr, 2)
    print(repeated_arr)
    # Output: [1 1 2 2 3 3]
  • numpy.tile(a, reps): Constructs an array by repeating an array a specified number of times. reps can be an integer or a tuple specifying the number of repetitions along each dimension.

    import numpy as np
    arr = np.array([1, 2])
    tiled_arr = np.tile(arr, 3)
    print(tiled_arr)
    # Output: [1 2 1 2 1 2]
    
    tiled_arr_2d = np.tile(arr, (2, 1)) # Repeat 2 times along axis 0
    print(tiled_arr_2d)
    # Output:
    # [[1 2]
    #  [1 2]]

8. Rearranging Elements

Operations to reorder elements within an array.

  • numpy.flip(m, axis=None): Reverses the order of elements along a given axis or axes.

    import numpy as np
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    flipped_arr = np.flip(arr, axis=0)
    print(flipped_arr)
    # Output:
    # [[4 5 6]
    #  [1 2 3]]
  • numpy.fliplr(m): Reverses the order of elements along axis 1 (left/right).

    import numpy as np
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    fliplr_arr = np.fliplr(arr)
    print(fliplr_arr)
    # Output:
    # [[3 2 1]
    #  [6 5 4]]
  • numpy.flipud(m): Reverses the order of elements along axis 0 (up/down).

    import numpy as np
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    flipud_arr = np.flipud(arr)
    print(flipud_arr)
    # Output:
    # [[4 5 6]
    #  [1 2 3]]
  • numpy.roll(a, shift, axis=None): Rolls array elements along a specified axis. This shifts elements cyclically.

    import numpy as np
    arr = np.array([1, 2, 3, 4])
    rolled_arr = np.roll(arr, 2)
    print(rolled_arr)
    # Output: [3 4 1 2]

9. Sorting and Searching

Powerful tools for sorting arrays and searching within them.

  • numpy.sort(a, axis=-1, kind=None, order=None): Returns a sorted copy of the array.

    import numpy as np
    arr = np.array([3, 1, 4, 2])
    sorted_arr = np.sort(arr)
    print(sorted_arr)
    # Output: [1 2 3 4]
  • numpy.argsort(a, axis=-1, kind=None, order=None): Returns the indices that would sort the array.

    import numpy as np
    arr = np.array([3, 1, 4, 2])
    sorted_indices = np.argsort(arr)
    print(sorted_indices)
    # Output: [1 3 0 2]
  • numpy.lexsort(keys, axis=-1): Performs an indirect stable sort using a sequence of keys. The last key in keys is the primary sort key.

  • numpy.searchsorted(a, v, side='left', sorter=None): Finds indices where elements should be inserted into a sorted array to maintain order.

    import numpy as np
    arr = np.array([1, 3, 5, 7])
    indices = np.searchsorted(arr, [2, 6])
    print(indices)
    # Output: [1 3]
  • numpy.argmax(a, axis=None, out=None): Returns the indices of the maximum values along an axis.

    import numpy as np
    arr = np.array([[1, 5, 3], [4, 2, 6]])
    max_indices = np.argmax(arr, axis=1)
    print(max_indices)
    # Output: [1 2]
  • numpy.argmin(a, axis=None, out=None): Returns the indices of the minimum values along an axis.

    import numpy as np
    arr = np.array([[1, 5, 3], [4, 2, 6]])
    min_indices = np.argmin(arr, axis=1)
    print(min_indices)
    # Output: [0 1]
  • numpy.nonzero(a): Returns the indices of the non-zero elements in an array.

    import numpy as np
    arr = np.array([[0, 1, 0], [2, 0, 0]])
    non_zero_indices = np.nonzero(arr)
    print(non_zero_indices)
    # Output: (array([0, 1]), array([1, 0]))
  • numpy.where(condition[, x, y]): Returns elements chosen from x or y depending on the condition. If only condition is provided, it returns indices of True elements.

    import numpy as np
    arr = np.array([1, 2, 3, 4])
    where_result = np.where(arr > 2)
    print(where_result)
    # Output: (array([2, 3]),)

10. Set Operations

Perform mathematical set operations on arrays such as union, intersection, and difference. These functions operate on flattened arrays.

  • numpy.in1d(ar1, ar2, assume_unique=False, invert=False): Tests whether each element of one array is present in another array.

    import numpy as np
    arr1 = np.array([1, 2, 3, 4])
    arr2 = np.array([3, 4, 5, 6])
    in_arr = np.in1d(arr1, arr2)
    print(in_arr)
    # Output: [False False  True  True]
  • numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False): Finds the intersection of two arrays (unique elements present in both).

    import numpy as np
    arr1 = np.array([1, 2, 3, 4])
    arr2 = np.array([3, 4, 5, 6])
    intersection = np.intersect1d(arr1, arr2)
    print(intersection)
    # Output: [3 4]
  • numpy.setdiff1d(ar1, ar2, assume_unique=False): Finds the set difference of two arrays (unique values in the first array that are not in the second).

    import numpy as np
    arr1 = np.array([1, 2, 3, 4])
    arr2 = np.array([3, 4, 5, 6])
    difference = np.setdiff1d(arr1, arr2)
    print(difference)
    # Output: [1 2]
  • numpy.setxor1d(ar1, ar2, assume_unique=False): Finds the set symmetric difference of two arrays (unique values present in either, but not both).

    import numpy as np
    arr1 = np.array([1, 2, 3, 4])
    arr2 = np.array([3, 4, 5, 6])
    symmetric_difference = np.setxor1d(arr1, arr2)
    print(symmetric_difference)
    # Output: [1 2 5 6]
  • numpy.union1d(ar1, ar2): Finds the sorted union of two arrays (unique elements from both).

    import numpy as np
    arr1 = np.array([1, 2, 3, 4])
    arr2 = np.array([3, 4, 5, 6])
    union = np.union1d(arr1, arr2)
    print(union)
    # Output: [1 2 3 4 5 6]

11. Other Array Operations

Additional useful array operations.

  • numpy.clip(a, a_min, a_max, out=None): Limits (clips) values in an array to be within a specified range.

    import numpy as np
    arr = np.array([-1, 0, 5, 10])
    clipped_arr = np.clip(arr, 0, 5)
    print(clipped_arr)
    # Output: [0 0 5 5]
  • numpy.round(a, decimals=0, out=None): Rounds array values to the given number of decimal places.

    import numpy as np
    arr = np.array([1.234, 5.678, 9.012])
    rounded_arr = np.round(arr, decimals=1)
    print(rounded_arr)
    # Output: [1.2 5.7 9.0]
  • numpy.diagonal(a, offset=0, axis1=0, axis2=1): Returns specified diagonals of an array.

    import numpy as np
    arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    diagonals = np.diagonal(arr)
    print(diagonals)
    # Output: [1 5 9]
  • numpy.trace(a, offset=0, axis1=0, axis2=1, dtype=None, out=None): Returns the sum along the diagonals of an array.

    import numpy as np
    arr = np.array([[1, 2], [3, 4]])
    trace_sum = np.trace(arr)
    print(trace_sum)
    # Output: 5 (1 + 4)
  • numpy.take(a, indices, axis=None, out=None, mode='clip'): Takes elements from an array along an axis. This is an alternative to fancy indexing.

    import numpy as np
    arr = np.array([0, 1, 2, 3, 4])
    taken_elements = np.take(arr, [1, 3, 4])
    print(taken_elements)
    # Output: [1 3 4]
  • numpy.put(a, ind, v, mode='raise'): Replaces specified elements of an array with given values.

    import numpy as np
    arr = np.array([0, 1, 2, 3, 4])
    np.put(arr, [1, 3], [9, 8])
    print(arr)
    # Output: [0 9 2 8 4]
  • numpy.choose(a, choices, out=None, mode='raise'): Constructs an array from an index array and a list of arrays (choices).

    import numpy as np
    arr = np.array([0, 1, 2]) # Indices
    choices_arr = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
    chosen_elements = np.choose(arr, choices_arr)
    print(chosen_elements)
    # Output: [10 50 90]