Understanding NumPy: Datatypes, Memory Storage, and Structured Arrays.

RMAG news

Datatypes and Memory Storage in NumPy Arrays

The numpy.dtype class in NumPy provides essential information about the data type of an array. Utilizing its itemsize attribute, one can easily retrieve the size of one element within the array. This feature is particularly useful for understanding memory usage and data representation within NumPy arrays.

import numpy as np

# Create an array ‘b’ with float data type
b = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
dtype_info_b = b.dtype # Get the data type information

# Print the data type information and the size of one element
print(Data type of the array:, dtype_info_b)
print(Size of one element in bytes:, dtype_info_b.itemsize)

# Calculate the total memory size consumed by the array ‘b’
total_memory_size_b = len(b) * dtype_info_b.itemsize
print(Total memory size consumed by array b:, total_memory_size_b, bytes)

# Alternatively, use the nbytes attribute to directly get the total memory size
total_memory_size_b_alt = b.nbytes
print(Total memory size consumed by array b (alternative method):, total_memory_size_b_alt, bytes)

Data type of the array: float64
Size of one element in bytes: 8
Total memory size consumed by array b: 40 bytes
Total memory size consumed by array b (alternative method): 40 bytes

In this example, we create an array b containing floating-point numbers. We then retrieve the data type information using b.dtype and store it in dtype_info_b. We print both the data type and the size of one element in bytes using the itemsize attribute of dtype_info_b.

After that, we calculate the total memory size consumed by the array b by multiplying the number of elements (len(b)) with the size of one element (dtype_info_b.itemsize). Finally, we use the nbytes attribute of the array b to directly obtain the total memory size consumed.

Creating the NumPy array with defined datatype

import numpy as np

# Example 1: Creating an array with default storage for integers (32-bit)
arr1 = np.array([1, 2, 3, 4, 5], dtype=int)
print(Example #1:)
print(Array:, arr1)
print(Data type:, arr1.dtype)
print()

# Example 2: Creating an array with defined storage size for integers (16-bit)
arr2 = np.array([1, 2, 3, 4, 5], dtype=np.int16)
print(Example #2:)
print(Array:, arr2)
print(Data type:, arr2.dtype)
print()

# Example 3: Using character code with storage value to specify data type (16-bit integer)
arr3 = np.array([1, 2, 3, 4, 5], dtype=i2)
print(Example #3:)
print(Array:, arr3)
print(Data type:, arr3.dtype)

Example #1:
Array: [1 2 3 4 5]
Data type: int32

Example #2:
Array: [1 2 3 4 5]
Data type: int16

Example #3:
Array: [1 2 3 4 5]
Data type: int16

In these examples:

Example 1 illustrates creating an array with the default storage for integers, resulting in a 32-bit integer array.
Example 2 demonstrates creating an array with a defined storage size for integers, specifying np.int16, which results in a 16-bit integer array.
Example 3 shows another way to achieve the same result as Example 2 by using a character code (‘i2’) with the desired storage value (16-bit) to specify the data type.

You can specify the boolean data type also directly within the array creation function.
Here’s an example demonstrating how to define the boolean data type within the array creation:

import numpy as np

bool_arr = np.array([True, False, True, True], dtype=bool)
print(Array:, bool_arr)
print(Data type:, bool_arr.dtype)

Array: [ True False True True]
Data type: bool

Here are examples demonstrating array creation with string and Unicode string types by specifying the dtypes inside the array function:

import numpy as np

# Example_1: Creating an array with string type
str_arr = np.array([hello, world, numpy],dtype=S)
print(Array:, str_arr)
print(Data type:, str_arr.dtype)

# Example_2: Creating an array with Unicode type
str_arr = np.array([hello, world, numpy],dtype=U)
print(Array:, str_arr)
print(Data type:, str_arr.dtype)

Example_1: Array: [bhello bworld bnumpy]
Data type: |S5

Example_2 : Array: [hello world numpy]
Data type: <U5

When creating a NumPy array, if we specify a data type using ‘S’ (for string) or ‘U’ (for Unicode string), the length of the string will be automatically determined based on the longest element in the array. If a string exceeds the specified length, it will be truncated to fit.

Structured DataType or record type

It allows for fields with different data types within the same structure, unlike a typical NumPy array. To create a structured data type, you can use the numpy.dtype() function. One approach is to define it by passing a list of tuples containing (field_name, data_type) pairs.

import numpy as np

# Define a new structured data type
student_dtype = np.dtype([
(student_id, np.int32),
(course, S20), # String with size 20 characters
(grade, np.float64) # Floating-point grade
])

# Create an array with student data
student_array = np.array([
(101, Math, 85.5),
(102, History, 78.2),
(103, Physics, 92.0)
], dtype=student_dtype)

#Print the array
print(student_array)

# Print the dtype
print(student_array.dtype)

[(101, bMath, 85.5) (102, bHistory, 78.2) (103, bPhysics, 92. )]
[(student_id, <i4), (course, S20), (grade, <f8)]

In the case of a multidimensional array, you can create a structured array by specifying the third argument, which represents the shape of the field. This allows you to define fields with different dimensions within the same structured array.