A smaller YugabyteDB image for CI/CD

A smaller YugabyteDB image for CI/CD

To establish a CI/CD pipeline, setting up a new database and executing the DDL (Data Definition Language) scripts to create the schema and the DML (Data Manipulation Language) scripts to populate data can be time-consuming. Creating an image that includes all the necessary components for the process to run smoothly is advisable to streamline this.
Here is an example where I install the well-known Sakila database:

# Start YugabyteDB
yugabyted start

# Create “sakila” database once ready
until ysqlsh -h $(hostname) -c “create database sakila” ; do sleep 1 ; done | uniq

# get the DDL and DML scripts from jOOQ repository, and run them
curl -Ls https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-schema.sql |
ysqlsh -eh $(hostname)
curl -Ls https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-insert-data.sql |
ysqlsh -eh $(hostname)

# stop YugabyteDB
yugabyted stop

You can use a Dockerfile to create an image that starts quickly with the database schema and data pre-installed.

The inserted data in the LSM Tree is stored only in the write-ahead logs (WALs) while the size remains small.

# du -hs /root/var/data/yb-data/*/{data,wals} | sort -h
3.1M /root/var/data/yb-data/master/wals
4.0M /root/var/data/yb-data/tserver/data
13M /root/var/data/yb-data/master/data
70M /root/var/data/yb-data/tserver/wals

If you build a docker image using this, the resulting image will be too large:

# docker image ls yb-sakila
REPOSITORY TAG IMAGE ID CREATED SIZE
yb-sakila latest 819b80485518 About a minute ago 3.87GB

The reason is that there are sparse files that do not use space for the unallocated parts, but Docker stores the whole file. With –apparent-size, you can check the size:

du -hs –apparent-size /root/var/data/yb-data/*/{data,wals} | sort -h
1.8M /root/var/data/yb-data/tserver/data
13M /root/var/data/yb-data/master/data
26M /root/var/data/yb-data/master/wals
1.7G /root/var/data/yb-data/tserver/wals

this indicates that every tablet possesses an index.000000000 file of approximately 23 megabytes in size:

The Sakila schema consists of sixty tables and indexes, which consume over one gigabyte of space. If we extrapolate the size of an image for a schema with a thousand tables, it would be enormous.

However, there is some good news! The index file in question is not actually necessary. YugabyteDB’s Raft replication code has been taken from Apache Kudo, and this file is simply an index for the WAL cached in memory. It is used when a follower disconnected from a leader comes back and retrieves a range of write operations to resolve the gap. The index does not need to be persisted as it is re-created when starting. This is described in Apache Kudo’s LogIndex and is implemented as a memory-mapped file that is never synced to disk.

Therefore, we can safely drop it when stopping YugabyteDB.

yugabyted stop
rm -f /root/var/data/yb-data/*/wals/table-*/tablet-*/index.*

In a Dockerfile, all actions must be performed in the same layer so that the allocated space is reclaimed upon file removal. Here is an example:

FROM yugabytedb/yugabyte:latest

# get Sakila DDL and DML scripts
ADD https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-schema.sql .
ADD https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-insert-data.sql .

# Start YugabyteDB to run the scripts
RUN yugabyted start
&& until ysqlsh -h $(hostname) -c “create database sakila” ; do sleep 1 ; done | uniq
&& ysqlsh -h $(hostname) -d sakila -f yugabytedb-sakila-schema.sql
&& ysqlsh -h $(hostname) -d sakila -f yugabytedb-sakila-insert-data.sql
&& yugabyted stop
&& rm -f /root/var/data/yb-data/*/wals/table-*/tablet-*/index.*

# starting a container can re-start YugabyteDB
ENTRYPOINT yugabyted start –background=false

I am able to create an image and verify its size:

docker build -t yb-sakila .
docker image ls yb-sakila

The image is now back to its expected size with an additional 150MB:

# docker image ls yb-sakila
REPOSITORY TAG IMAGE ID CREATED SIZE
yb-sakila latest b462eb5aaacb 17 seconds ago 2.19GB

Leave a Reply

Your email address will not be published. Required fields are marked *