Quick tip: Using Apache Spark with SingleStore Notebooks

Abstract

SingleStore has been providing a cloud portal and a DBaaS offering for some time. Additionally, it has offered a Spark Connector for a while, but Apache Spark had to be run externally. The recent addition of notebooks to the cloud portal has significantly improved Data Science capabilities, including the ability to use Apache Spark. Spark can now be installed in the notebook environment in a few minutes. This article will show how.

Create a SingleStore Cloud account

A previous article showed the steps to create a free SingleStore Cloud account. We’ll use the following settings:

Workspace Group Name: Spark Demo Group

Cloud Provider: AWS

Region: US East 1 (N. Virginia)

Workspace Name: spark-demo

Size: S-00

Create a new notebook

From the left navigation pane in the cloud portal, we’ll select Develop > Notebooks.

In the top right of the web page, we’ll select New Notebook > New Notebook, as shown in Figure 1.

Figure 1. New Notebook.

We’ll call the notebook spark_demo, select a Blank notebook template from the available options, and save it in the Personal location.

Fill out the notebook

Install Apache Spark

We can easily install Java and Spark, as follows:

!conda install -y –quiet -c conda-forge openjdk pyspark

Once the installation has been completed, we can check the version of Java, as follows:

!java version

Example output:

openjdk version “1.8.0_382”

OpenJDK Runtime Environment (Zulu 8.72.0.17-CA-linux64) (build 1.8.0_382-b05)

OpenJDK 64-Bit Server VM (Zulu 8.72.0.17-CA-linux64) (build 25.382-b05, mixed mode)

Next, let’s check the version of PySpark:

import pyspark

print(pyspark.__version__)

Example output:

3.5.1

Finally, we’ll check the version of Python:

import sys

print(sys.version)

Example output:

3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0]

There is a useful Spark Python Supportability Matrix that shows the compatibility of Python with various Spark releases.

Test Apache Spark

Now, let’s test the Apache Spark installation.

First, let’s create a SparkSession:

from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.appName(“Spark Test“).getOrCreate()

Next, let’s create a DataFrame:

# Create a DataFrame

data = [(“Peter“, 27), (“Paul“, 28), (“Mary“, 29)]

df = spark.createDataFrame(data, [“Name“, “Age“])

Now we’ll show the DataFrame:

# Show the content of the DataFrame

df.show()

The output should be as follows:

+—–+—+

| Name|Age|

+—–+—+

|Peter| 27|

| Paul| 28|

| Mary| 29|

+—–+—+

Finally, we’ll stop the SparkSession:

# Stop the Spark session

spark.stop()

Summary

In this short article, we’ve seen how to install and use Apache Spark in the SingleStore notebook environment. In future articles, we’ll explore Spark’s capabilities more extensively and demonstrate how to integrate it with the SingleStore Data Platform for reading and writing data using a database.

Stiri similare

Chicago woman charged with biting cop at Hammond Walmart

Así ha sido el último punto de Nadal en el Mutua Madrid Open y sus partidos contra Djokovic y Federer en la Caja Mágica

Daily News boys athlete of the week: Dylan Volantis, Westlake

The Cheyenne Supercomputer is going for a fraction of its list price at auction right now

City celebrates townhome transformation in Nob Hill

Top battleground Senate race heats up as party-backed Republican faces onslaught from former Trump official

Quick tip: Using Apache Spark with SingleStore Notebooks

Abstract

Create a SingleStore Cloud account

Create a new notebook

Fill out the notebook

Install Apache Spark

Test Apache Spark

Summary

Related

Leave a Reply Cancel reply

Abstract

Create a SingleStore Cloud account

Create a new notebook

Fill out the notebook

Install Apache Spark

Test Apache Spark

Summary

Share on:

Related

Leave a Reply Cancel reply

Stiri similare