The Quest for Performance Part II : Perl vs Python

Claudio Ctin2 months ago7 mins

Having run a toy performance example, we will now digress somewhat and contrast the performance against
a few Python implementations. First let’s set up the stage for the calculations, and provide commandline
capabilities to the Python script.

import argparse
import time
import math
import numpy as np
import os
from numba import njit
from joblib import Parallel, delayed

parser = argparse.ArgumentParser()
parser.add_argument(“–workers“, type=int, default=8)
parser.add_argument(“–arraysize“, type=int, default=100_000_000)
args = parser.parse_args()
# Set the number of threads to 1 for different libraries
print(“=“ * 80)
print(
f“nStarting the benchmark for {args.arraysize} elements “
f“using {args.workers} threads/workersn“
)

# Generate the data structures for the benchmark
array0 = [np.random.rand() for _ in range(args.arraysize)]
array1 = array0.copy()
array2 = array0.copy()
array_in_np = np.array(array1)
array_in_np_copy = array_in_np.copy()

And here are our contestants:

Base Python

 for i in range(len(array0)):
 array0[i] = math.cos(math.sin(math.sqrt(array0[i])))

Numpy (Single threaded)

 np.sqrt(array_in_np, out=array_in_np)
 np.sin(array_in_np, out=array_in_np)
 np.cos(array_in_np, out=array_in_np)

Joblib (note that this example is not a true in-place one, but I have not been able to make it run using the out arguments)

def compute_inplace_with_joblib(chunk):
return np.cos(np.sin(np.sqrt(chunk))) #parallel function for joblib

chunks = np.array_split(array1, args.workers) # Split the array into chunks
numresults = Parallel(n_jobs=args.workers)(
delayed(compute_inplace_with_joblib)(chunk) for chunk in chunks
)# Process each chunk in a separate thread
array1 = np.concatenate(numresults) # Concatenate the results

Numba

 @njit
 def compute_inplace_with_numba(array):
 np.sqrt(array,array)
 np.sin(array,array)
 np.cos(array,array)
 ## njit will compile this function to machine code
 compute_inplace_with_numba(array_in_np_copy)

And here are the timing results:

 In place in (  base Python): 11.42 seconds
 In place in (Python Joblib): 4.59 seconds
 In place in ( Python Numba): 2.62 seconds
 In place in ( Python Numpy): 0.92 seconds

The numba is surprisingly slower!? Could it be due to the overhead of compilation as pointed out by mohawk2 in an IRC exchange about this issue?
To test this, we should call compute_inplace_with_numba once before we execute the benchmark. Doing so, shows that Numba is now faster than Numpy.

 In place in (  base Python): 11.89 seconds
 In place in (Python Joblib): 4.42 seconds
 In place in ( Python Numpy): 0.93 seconds
 In place in ( Python Numba): 0.49 seconds

Finally, I decided to take base R for ride in the same example:

n<-50000000
x<-runif(n)
start_time <- Sys.time()
result <- cos(sin(sqrt(x)))
end_time <- Sys.time()

# Calculate the time taken
time_taken <- end_time – start_time

# Print the time taken
print(sprintf(“Time in base R: %.2f seconds”, time_taken))

which yielded the following timing result:

Time in base R: 1.30 seconds

Compared to the Perl results we note the following about this example:

Inplace operations in base Python were ~ 3.5 slower than Perl
Single threaded PDL and numpy gave nearly identical results, followed closely by base R
Failure to account for the compilation overhead of Numba yields the false impression that it is slower than Numpy. When accounting for the compilation overhead, Numba is x2 faster than Numpy
Parallelization with Joblib did improve upon base Python, but was still inferior to the single thread Perl implementation
Multi-threaded PDL (and OpenMP) crushed (not crashed!) every other implementation in all lanugages).
Hopefully this post
provides some food for thought about
the language to use for your next data/compute intensive operation.
The next part in this series will look into the same example using arrays in C. This final installment will (hopefully) provide some insights about the impact of memory locality and the overhead incurred by using dynamically typed languages.

Please follow and like us:

Stiri similare

Python vs Java: A Deep Dive into the Best Programming Language for You

Claudio Ctin8 mins ago1 min ago

Hey everyone! How’s your week going? 😊 Whether you’re in the middle of a coding marathon, enjoying a well-deserved break, or just here to explore new tech ideas, we’re happy to have you. Today, we’re diving into a hot topic: Python vs. Java. 🚀 These two programming giants are often at the center of debates,…

Using TypeORM with TSX: A Smoother Development Experience

Claudio Ctin37 mins ago1 min ago

Working with TypeScript in Node.js applications can sometimes be challenging, especially when integrating with tools like TypeORM. While ts-node has been a popular choice for running TypeScript directly, it often comes with configuration headaches and performance bottlenecks. In this article, we’ll explore using tsx as an alternative to ts-node for TypeORM, providing a smoother and…

Should We Allow to Appear as A Child of ?

Claudio Ctin39 mins ago1 min ago

It actually works well. Please follow and like us:

Next Js Localisation

Claudio Ctin42 mins ago50 seconds ago

Hello All, I am new to this community and thank you for your support and time. I am working on a bilingual site (English and Japanese). I am using next-intl package to set the locale. I am trying to implement the following use cases Set the default language to user browser language Set the default…

Scale Deployments Based on HTTP Requests with Keda

Claudio Ctin43 mins ago37 seconds ago

I’m using CloudWatch as a trigger, which means I have installed the CloudWatch agent responsible for sending metrics to CloudWatch. My KEDA fetches metrics based on the collection interval defined in the configuration of ScaledObject. You can follow the configuration below to set up the ScaledObject based on the number of HTTP requests. apiVersion: keda.sh/v1alpha1…

Human readable time in Rust

Claudio Ctin46 mins ago25 seconds ago

When it comes to formatting time into the human-readable format in Rust, there could be various possible solutions including doing some math or using Instance. However, I demonstrate the millisecond crate, the dedicated and specialized crate for the purpose. This crate converts nanoseconds, microseconds, milliseconds, seconds, etc into short and long formats; suitable for human…

Introduction to the MERN Stack: Building Full-Stack Applications with Ease 🚀

Claudio Ctin47 mins ago14 seconds ago

In the world of web development, the MERN stack has emerged as a powerful solution for creating dynamic and interactive applications. Comprising four key technologies—MongoDB, Express.js, React, and Node.js—this stack allows developers to build full-stack applications using JavaScript throughout. In this blog post, we’ll explore how these technologies work together, their pros and cons, and…

IaC Security Analysis: Checkov vs. tfsec vs. Terrascan – A Comparative Evaluation

Claudio Ctin2 hours ago2 hours ago

Traditional, manual security processes can’t keep up with the speed of modern development, which leaves systems vulnerable to attacks. That’s where Security as Code (SaC) comes in. SaC automates security checks and policies, making them an integral part of the development pipeline. This ensures that security is built into every step without slowing down progress….