Abstract
SingleStore provides a Jupyter-based notebook environment with support for Python, SQL and Markdown. However, we can also install and use the R programming language. In this article, we’ll see how.
The notebook file used in this article is available on GitHub.
Create a SingleStore Cloud account
A previous article showed the steps to create a free SingleStore Cloud account. We’ll use the following settings:
Workspace Group Name: R Demo Group
Cloud Provider: AWS
Region: US East 1 (N. Virginia)
Workspace Name: r-demo
Size: S-00
Create a new notebook
From the left navigation pane in the cloud portal, we’ll select Develop > Notebooks.
In the top right of the web page, we’ll select New Notebook > New Notebook, as shown in Figure 1.
Figure 1. New Notebook.
We’ll call the notebook r_demo, select a Blank notebook template from the available options, and save it in the Personal location.
Create a database
In our SingleStore Cloud account, let’s use the SQL Editor to create a new database. Call this iris_db, as follows:
CREATE DATABASE IF NOT EXISTS iris_db;
Fill out the notebook
First, let’s install the R kernel and some other packages we need for this article:
Next, we need to change the kernel. Refreshing the page will help the notebook detect any changes, including the installation of a new kernel.
In the top right, we can see that Python is currently selected, as shown in Figure 2.
Figure 2. Python 3 (ipykernel).
Selecting Python 3 will present a box with a pull-down as shown in Figure 3.
Figure 3. Select Kernel.
Clicking the pull-down will show some options and R should be one of the options. We’ll choose R, as shown in Figure 4.
Figure 4. Select Kernel.
Next, we’ll click the Select button.
To connect to SingleStore, we’ll use JDBC, as follows:
This will also load DBI and rJava:
Loading required package: rJava
Next, we’ll download the SingleStore JDBC Client and save it in a jars directory:
driver_url <- “https://repo1.maven.org/maven2/com/singlestore/singlestore-jdbc-client/1.2.1/singlestore-jdbc-client-1.2.1.jar” # Set the JDBC driver class name
driver <- “com.singlestore.jdbc.Driver” # Local directory to save the driver file
local_dir <- “jars”
dir.create(local_dir, showWarnings = FALSE, recursive = TRUE) # Check if the driver file already exists
driver_file <- file.path(
local_dir,
“singlestore-jdbc-client-1.2.1.jar”
) if (!file.exists(driver_file)) {
# Download the JDBC driver file if it doesn’t exist
download.file(
driver_url,
destfile = driver_file,
mode = “wb”,
quiet = TRUE
)
} # Check if the driver file has been downloaded successfully
if (file.exists(driver_file)) {
print(“Driver file downloaded successfully”)
} else {
print(“Failed to download the driver file”)
}
Now we’ll create the connection details to SingleStore:
port <- 3306
database <- “iris_db”
user <- “admin”
password <- “<PASSWORD>” url <- paste0(“jdbc:singlestore://”, host, “:”, port, “/”, database)
Replace <HOST> and <PASSWORD> with the values for your environment. These values can be obtained from the workspace using Connect > SQL IDE.
Next, let’s load the iris dataset, make some small adjustments to the column names and show the first few rows:
data(iris) # Replace “.” with “_” in column names
colnames(iris) <- gsub(“\.”, “_”, colnames(iris)) # Print the first few rows of the dataset
head(iris)
Example output:
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
We’ll now prepare the connection to SingleStore, write the iris dataset to the database and read it back again.
conn <- dbConnect(
drv = JDBC(driver, driver_file),
url = url,
user = user,
password = password
) # Write the iris dataset to the database
dbWriteTable(conn, “iris”, iris, overwrite = TRUE) # Read the iris dataset from the database
iris_from_db <- dbReadTable(conn, “iris”) # Print the first few rows of the dataset read from the database
head(iris_from_db) # Close the JDBC connection
dbDisconnect(conn)
Example output:
1 5.7 2.9 4.2 1.3 versicolor
2 5.1 3.8 1.5 0.3 setosa
3 5.4 3.0 4.5 1.5 versicolor
4 4.3 3.0 1.1 0.1 setosa
5 5.5 2.5 4.0 1.3 versicolor
6 6.4 2.9 4.3 1.3 versicolor
Bonus: Create visualisations
We can easily create some plots using ggplot:
First, a scatter plot of Sepal Length vs Sepal Width, as shown in Figure 5.
ggplot(iris_from_db, aes(x = Sepal_Length, y = Sepal_Width, color = Species)) +
geom_point() +
labs(x = “Sepal Length”, y = “Sepal Width”, title = “Sepal Length vs Sepal Width”)
Figure 5. Sepal Length vs Sepal Width.
Next, a box plot of Petal Length by Species, as shown in Figure 6.
ggplot(iris_from_db, aes(x = Species, y = Petal_Length, fill = Species)) +
geom_boxplot() +
labs(x = “Species”, y = “Petal Length”, title = “Petal Length by Species”)
Figure 6. Petal Length by Species.
Finally, a histogram of Petal Width, as shown in Figure 7.
ggplot(iris_from_db, aes(x = Petal_Width)) +
geom_histogram(binwidth = 0.1, fill = “skyblue”, color = “black”) +
labs(x = “Petal Width”, y = “Frequency”, title = “Petal Width”)
Figure 7. Petal Width.
Summary
In this short article, we’ve seen how to install R, how to connect to SingleStore from R, and how to write and read data using R and SingleStore. We’ve also quickly and easily created several powerful visualisations.