Simulate Clock Skew in Docker Container

Simulate Clock Skew in Docker Container

In real deployments, without atomic clocks, the time synchronized by NTP can drift, and servers in a distributed system can show a clock skew of hundreds of milliseconds. A simple way to test this in a Docker lab is to fake the clock_gettime function. Here is an example with a 2-node RF1 YugabyteDB cluster (PostgreSQL-compatible Distributed SQL database).

I create a yb network and start the first node, yb1 in the background:

docker network create yb
docker run -d –rm –network yb –hostname yb1 -p 7000:7000 yugabytedb/yugabyte yugabyted start –background=false –tserver_flags=“TEST_docdb_log_write_batches=true”

I start a shell in a second node:

docker run -it –rm –network yb –hostname yb2 yugabytedb/yugabyte bash

In this container, I wait to be sure that yb1 is up and start yb2 that joins yb1

until postgres/bin/pg_isready -h yb1.yb ; do sleep 1 ; done
yugabyted start –join yb1.yb –tserver_flags=“TEST_docdb_log_write_batches=true”

Here, running on the same host, both containers show the same Physical Time in http://localhost:7000/tablet-server-clocks

I install gcc and compile a fake_clock_gettime.so that overrides clock_gettime, calls the original one, and subtracts 499 milliseconds to its result:

cat > fake_clock_gettime.c <<‘C’
#define _GNU_SOURCE
#include
<stdlib.h>
#include
<dlfcn.h>
int clock_gettime(clockid_t clk_id, struct timespec *tp)
{
static int skew_millisecond = 499;
static int (*origin_clock_gettime)();
static int ret;
// define the real clock_gettime and call it
if(!origin_clock_gettime) {
origin_clock_gettime = (int (*)()) dlsym(RTLD_NEXT, “clock_gettime”);
}
ret=origin_clock_gettime(clk_id,tp);
// add clock skew and return
if (tp->tv_nsec >= skew_millisecond * 1000000 ) {
tp->tv_nsec -= skew_millisecond * 1000000 ;
} else {
tp->tv_sec -= 1;
tp->tv_nsec += 1000000000 skew_millisecond * 1000000 ;
}
return(ret);
}
C

dnf install y gcc

gcc o fake_clock_gettime.so fPIC shared fake_clock_gettime.c ldl

This library can be loaded with LD_PRELOAD, and I test it by calling date:

[root@yb2 yugabyte]# date +“%T:%N” ; LD_PRELOAD=$PWD/fake_clock_gettime.so date +“%T:%N” ; date +“%T:%N”
21:31:44:015385334
21:31:43:518894559
21:31:44:020271039
[root@yb2 yugabyte]# date +“%T:%N” ; LD_PRELOAD=$PWD/fake_clock_gettime.so date +“%T:%N” ; date +“%T:%N”
21:31:45:955772746
21:31:45:459189786
21:31:45:960576587

The date called with the library shows a lower time.

I restart YugabyteDB on yb2 with this hack:

yugabyted stop
LD_PRELOAD=$PWD/fake_clock_gettime.so yugabyted start

I can see the clock skew on the Physical Time and Hybrid Time:

I run some workload that involves tablets in both nodes to get some Lamport logical clock synchronization:

/home/yugabyte/postgres/bin/ysql_bench -i -h $(hostname) -s 10

With the messaging between the nodes, the Physical Time still shows a clock skew, but the Logical Time is synchronized:

If you are curious, here is more information about clock synchronisation in distributed databases: https://www.yugabyte.com/blog/evolving-clock-sync-for-distributed-databases/

Leave a Reply

Your email address will not be published. Required fields are marked *