import numpy as np
13 NumPy
In computational mathematics and statistics, arrays are fundamental structures for storing and manipulating large datasets. The Python library NumPy (Numerical Python) offers an efficient and powerful way to handle multi-dimensional arrays and matrices, enabling fast numerical computations. This chapter will introduce the concept of arrays using NumPy and explore basic operations that can be performed on these arrays.
13.1 Introduction to NumPy
In computational tasks, particularly in statistics and mathematics, efficiency and speed are crucial when working with large datasets or performing complex numerical computations. The Python library NumPy (short for Numerical Python) addresses these needs by providing support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays.
13.1.1 Why Use NumPy?
While Python’s built-in data structures like lists are flexible and easy to use, they are not optimized for numerical computation. NumPy offers a significant performance boost due to its underlying implementation in C, allowing for more efficient memory use and faster execution of mathematical operations. This makes NumPy a cornerstone for scientific computing, as it can handle large-scale numerical data with a much higher efficiency than native Python structures.
Here are several key reasons why NumPy is indispensable in computational mathematics and statistics:
Performance: NumPy arrays are faster than Python lists, especially when performing repeated or large-scale computations. This speed advantage arises from the fact that NumPy arrays are implemented in C, and operations on arrays are performed in compiled code, reducing the overhead associated with Python’s dynamic typing and interpreted nature.
Convenience: NumPy provides a wide range of functions that allow users to manipulate arrays in various ways, such as reshaping, indexing, slicing, and performing mathematical operations, with far less code than is required with Python’s built-in data structures.
Mathematical Functionality: NumPy includes built-in functions for linear algebra, random number generation, statistical operations, and much more, all optimized for efficiency. These functions can be applied element-wise to arrays, which allows for vectorized operations. This contrasts with Python lists, where applying mathematical operations requires explicit looping, which is slower and less intuitive.
Interoperability: NumPy is widely used in conjunction with other scientific libraries such as SciPy (for additional mathematical functions), Matplotlib (for plotting), and TensorFlow (for machine learning). Its use is ubiquitous in data science, making it a key foundation for many other libraries.
Memory Efficiency: Arrays in NumPy are homogeneous, meaning that all elements in an array are of the same type. This contrasts with Python lists, which can contain elements of different types. The homogeneity of arrays allows NumPy to allocate memory more efficiently and perform operations much faster, particularly for large datasets.
13.1.2 Installing NumPy
Before you can use NumPy, you need to install it. If you are using Google Colab, Jupyter notebooks, or a typical Python development environment, NumPy is often pre-installed. If not, you can install it using the following command in your terminal or command prompt:
pip install numpy
After installation, you can import the library in your Python script:
Using the alias np
is a widely adopted convention in the Python community and keeps your code clean and concise when calling NumPy functions.
13.2 Arrays in NumPy
Arrays are the fundamental building blocks in NumPy, providing a way to organize and manipulate large amounts of data efficiently. NumPy arrays, or ndarrays
(short for N-dimensional arrays), can store data in multiple dimensions, allowing for efficient data processing in a format that’s highly optimized for both performance and memory usage. This section explores the creation, manipulation, and essential operations with NumPy arrays.
13.2.1 Creating Arrays
Creating arrays in NumPy can be done in several ways depending on the need. Arrays can be initialized from lists, created using predefined shapes like zeros or ones, or generated using sequences of values.
Creating Arrays from Python Lists
One of the simplest ways to create an array is by converting a Python list into a NumPy array using np.array()
.
import numpy as np
# Creating a 1D array from a list
= np.array([1, 2, 3, 4, 5])
arr print(arr)
[1 2 3 4 5]
This creates a one-dimensional array. Similarly, you can create multi-dimensional arrays by passing in lists of lists.
# Creating a 2D array (a matrix) from nested lists
= np.array([[1, 2, 3], [4, 5, 6]])
arr_2d print(arr_2d)
[[1 2 3]
[4 5 6]]
In this case, arr_2d
is a 2x3 matrix with two rows and three columns.
Creating Arrays with NumPy Functions
NumPy also provides a variety of built-in functions to generate arrays with specific patterns or values.
np.zeros()
: Creates an array filled with zeros.# 3x4 array filled with zeros = np.zeros((3, 4)) zeros_array print(zeros_array)
[[0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.]]
np.ones()
: Creates an array filled with ones.# 2x5 array filled with ones = np.ones((2, 5)) ones_array print(ones_array)
[[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]]
np.full()
: Creates an array filled with a specified value.# 3x3 array filled with the value 7 = np.full((3, 3), 7) full_array print(full_array)
[[7 7 7] [7 7 7] [7 7 7]]
np.arange()
: Creates an array with a sequence of values, similar to Python’srange()
function, but returns an array.# Array with values [0, 2, 4, 6, 8] = np.arange(0, 10, 2) range_array print(range_array)
[0 2 4 6 8]
np.linspace()
: Creates an array with a specified number of evenly spaced values between a start and end point.# 5 values between 0 and 1, inclusive = np.linspace(0, 1, 5) linspace_array print(linspace_array)
[0. 0.25 0.5 0.75 1. ]
These functions are incredibly useful when you need to quickly generate arrays for mathematical computations, simulations, or testing algorithms.
13.2.2 Array Indexing and Slicing
NumPy arrays allow you to access and manipulate specific elements, subarrays, or slices of the array efficiently using indexing and slicing techniques. This functionality mirrors Python’s list slicing but extends it to multiple dimensions.
Indexing in 1D Arrays
In a one-dimensional array, you can access elements by specifying the index of the element you want:
= np.array([10, 20, 30, 40, 50])
arr print(arr[0])
print(arr[-1])
10
50
Just like Python lists, NumPy arrays support negative indexing, where -1
refers to the last element, -2
to the second-to-last, and so on.
Indexing in Multi-Dimensional Arrays
In multi-dimensional arrays, you access elements by specifying indices for each dimension:
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d print(arr_2d[1, 2])
6
To access entire rows or columns, you can use slicing. For example:
# Access the second row
print(arr_2d[1, :])
# Access the third column
print(arr_2d[:, 2])
[4 5 6]
[3 6 9]
Slicing Arrays
Slicing allows you to access subarrays or parts of an array. The syntax is similar to Python lists: start:stop:step
, where:
start
is the starting index (inclusive),stop
is the stopping index (exclusive),step
is the interval between indices.
= np.array([10, 20, 30, 40, 50, 60])
arr
# Slice from index 1 to 4 (not including 4)
print(arr[1:4])
# Slice with a step of 2
print(arr[::2])
[20 30 40]
[10 30 50]
Slicing works similarly with multi-dimensional arrays, allowing you to select rows and columns:
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d
# Slice rows 0 to 2, columns 1 to 3
print(arr_2d[0:2, 1:3])
[[2 3]
[5 6]]
13.3 Array Operations
One of the most powerful features of NumPy arrays is the ability to perform element-wise operations without needing loops. This feature is called vectorization, and it makes mathematical operations on arrays much more efficient.
13.3.1 Element-Wise Operations
You can perform arithmetic operations on entire arrays, and the operation is applied element-wise. This means each element in one array is paired with the corresponding element in another array (if the shapes match) to compute the result.
= np.array([1, 2, 3])
arr1 = np.array([4, 5, 6])
arr2
# Element-wise addition
print(arr1 + arr2)
# Element-wise multiplication
print(arr1 * arr2)
# Element-wise subtraction and division
print(arr2 - arr1)
print(arr2 / arr1)
[5 7 9]
[ 4 10 18]
[3 3 3]
[4. 2.5 2. ]
13.3.2 Scalar Operations
NumPy arrays also support operations with scalars, where the scalar is applied to every element in the array:
= np.array([10, 20, 30])
arr
# Multiply each element by 2
print(arr * 2)
# Add 5 to each element
print(arr + 5)
[20 40 60]
[15 25 35]
13.3.3 Broadcasting
When performing operations between arrays of different shapes, NumPy automatically expands the smaller array to match the larger one. This process is called broadcasting.
For example:
= np.array([1, 2, 3])
arr = 5
scalar
# Broadcasting allows the scalar to be added to each element of the array
print(arr + scalar)
[6 7 8]
Broadcasting is particularly useful when performing operations between arrays and scalars or between arrays with compatible shapes.
13.3.4 Array Shape and Reshaping
The shape of a NumPy array refers to the number of elements along each dimension. You can check the shape of an array using the .shape
attribute:
= np.array([[1, 2, 3], [4, 5, 6]])
arr_2d print(arr_2d.shape)
(2, 3)
If necessary, you can reshape arrays using the reshape()
function to change their dimensions while keeping the total number of elements constant:
= np.array([1, 2, 3, 4, 5, 6])
arr = arr.reshape((2, 3))
reshaped_arr print(reshaped_arr)
[[1 2 3]
[4 5 6]]
The above code reshapes a 1D array into a 2D array with two rows and three columns.
13.4 Mathematical Functions in NumPy
NumPy comes with a wide array of built-in mathematical functions that operate efficiently on arrays. These include functions for statistical calculations, linear algebra, and more.
13.4.1 Common Mathematical Functions
Sum: Computes the sum of all elements in the array.
= np.array([1, 2, 3, 4, 5]) arr print(np.sum(arr))
15
Mean: Computes the mean (average) of the array elements.
print(np.mean(arr))
3.0
Standard Deviation: Computes the standard deviation of the array elements.
print(np.std(arr))
1.4142135623730951
Min/Max: Finds the minimum and maximum values in the array.
print(np.min(arr)) print(np.max(arr))
1 5
These functions are crucial when working with large datasets, as they allow for quick and efficient analysis of the data.
13.4.2 Array Comparisons and Conditional Operations
NumPy arrays allow for efficient comparison operations, resulting in boolean arrays. This is particularly useful in applications such as filtering data or applying conditions.
= np.array([10, 20, 30, 40, 50])
arr
# Compare each element to 30
= arr > 30
comparison print(comparison)
# Use the comparison to filter the array
= arr[comparison]
filtered_arr print(filtered_arr)
[False False False True True]
[40 50]
13.5 NumPy Data Types
One of the strengths of NumPy is its ability to handle a wide range of data types. These types are much more memory-efficient compared to Python’s built-in data types, particularly when dealing with large datasets.
Some common NumPy data types include:
- int: Signed integers (e.g.,
np.int8
,np.int16
,np.int32
,np.int64
), where the number indicates the bit-length. - float: Floating-point numbers (e.g.,
np.float16
,np.float32
,np.float64
). - bool: Boolean type, where values can be
True
orFalse
. - complex: Complex numbers (e.g.,
np.complex64
,np.complex128
).
You can specify the data type of an array when creating it, or NumPy will infer the type based on the data you provide:
# Creating an array of integers
= np.array([1, 2, 3], dtype=np.int32)
arr_int
# Creating an array of floats
= np.array([1.0, 2.0, 3.0], dtype=np.float64)
arr_float
print(arr_int.dtype)
print(arr_float.dtype)
int32
float64
NumPy arrays are homogeneous; every element of the array has to be of the same data type. This is what allows NumPy to be so efficient: the array data is stored in contiguous blocks of memory, which makes it easier and faster to access and manipulate.
13.6 Exercises
Exercise 1: Creating Arrays
A. Create a 1D NumPy array containing the values 10, 20, 30, 40, and 50. Print the array.
B. Create a 2D NumPy array from the following list of lists: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
. Print the array.
C. Use np.arange()
to create an array containing the numbers from 0 to 15, with a step size of 3. Print the array.
Exercise 2: Indexing and Slicing
A. Given the array arr = np.array([100, 200, 300, 400, 500])
, print the element at index 2.
B. For the array arr_2d = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
, slice and print the subarray containing the first two rows and the first two columns.
C. Given the array arr = np.array([5, 10, 15, 20, 25, 30, 35, 40])
, slice and print every second element starting from the first.
Exercise 3: Element-Wise Operations
A. Create two 1D arrays arr1 = np.array([1, 2, 3])
and arr2 = np.array([4, 5, 6])
. Perform element-wise addition, subtraction, and multiplication of these arrays and print the results.
B. Multiply the array arr = np.array([2, 4, 6, 8, 10])
by 3 and print the result.
C. Create an array with values [2, 4, 6]
and square each element (i.e., multiply each element by itself). Print the resulting array.
Exercise 4: Array Shape and Reshaping
A. Create a 1D array with 12 elements using np.arange()
and reshape it into a 3x4 array. Print the reshaped array.
B. Given the array arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
, print the shape of the array.
C. Reshape the array arr = np.array([1, 2, 3, 4, 5, 6])
into a 2x3 matrix and print it.
Exercise 5: Mathematical Operations
A. Given the array arr = np.array([5, 10, 15, 20, 25])
, compute and print the sum of all elements in the array.
B. Create an array of numbers from 1 to 5. Compute and print the mean and standard deviation of the array.
C. Given the array arr = np.array([3, 7, 2, 8, 1, 9, 6, 5, 4])
, find and print the minimum and maximum values.
Exercise 6: Array Comparisons and Conditional Operations
A. Create an array arr = np.array([10, 20, 30, 40, 50])
. Use a comparison operator to create a boolean array that checks which elements are greater than 25. Print the result.
B. Use the boolean array from part (A) to filter the original array and print only the elements greater than 25.
C. Given the array arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
, filter and print only the even numbers.