What is NumPy?

NumPy (Numerical Python) is the fundamental package for numerical computation in Python. It provides a high-performance, multidimensional array object called the ndarray, along with tools for working with these arrays. Libraries like Pandas, SciPy, and Scikit-learn are all built on top of NumPy.

The NumPy ndarray

The heart of NumPy is the ndarray (N-dimensional array). It's a grid of values, all of the same type, and is indexed by a tuple of non-negative integers. It's similar to a Python list, but with some key differences:

  • Performance: NumPy arrays are stored in a continuous block of memory, which allows them to be processed much more efficiently.
  • Homogeneity: All elements in a NumPy array must be of the same data type (e.g., all int32 or all float64).
  • Functionality: NumPy provides a huge library of high-level mathematical functions that operate on these arrays.

Python


import numpy as np # It's a strong convention to import numpy as np

# Creating a 1D array from a Python list
arr1d = np.array([1, 2, 3, 4, 5])
print(f"1D Array: {arr1d}")

# Creating a 2D array (a matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(f"2D Array:\n{arr2d}")

# Checking attributes
print(f"Shape of arr2d: {arr2d.shape}") # (rows, columns)
print(f"Data type of arr2d: {arr2d.dtype}")

Vectorization: The "No Loops" Rule

Vectorization is the process of applying an operation to an entire array at once, rather than element by element in a loop. This is the key to NumPy's speed. NumPy uses optimized, pre-compiled C code under the hood to perform these operations, making it orders of magnitude faster than a plain Python for loop.

Python


# The slow, Python way with a for loop
my_list = [1, 2, 3, 4, 5]
result_list = []
for item in my_list:
    result_list.append(item * 2)

# The fast, NumPy way with vectorization
my_array = np.array([1, 2, 3, 4, 5])
result_array = my_array * 2 # The * operation is applied to every element

print(f"List result: {result_list}")
print(f"Array result: {result_array}")

Broadcasting: The Magic of Mismatched Shapes

Broadcasting is a powerful mechanism that allows NumPy to perform operations on arrays of different shapes. The smaller array is "broadcast" across the larger array so that they have compatible shapes.

The simplest example is operating between an array and a scalar (a single number), as we did above (my_array * 2). The scalar 2 was broadcast across all elements of my_array.

A more complex example:

Python


matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

vector = np.array([10, 20, 30])

# Broadcasting the vector across each row of the matrix
result = matrix + vector
print(result)
# Output:
# [[11 22 33]
#  [14 25 36]
#  [17 28 39]]

Here, NumPy effectively stretched the vector [10, 20, 30] into a (3, 3) matrix [[10, 20, 30], [10, 20, 30], [10, 20, 30]] and then performed the element-wise addition.