In real deployments, without atomic clocks, the time synchronized by NTP can drift, and servers in a distributed system can show a clock skew of hundreds of milliseconds. A simple way to test this in a Docker lab is to fake the clock_gettime function. Here is an example with a 2-node RF1 YugabyteDB cluster (PostgreSQL-compatible Distributed SQL database).
I create a yb network and start the first node, yb1 in the background:
docker run -d –rm –network yb –hostname yb1 -p 7000:7000 yugabytedb/yugabyte yugabyted start –background=false –tserver_flags=“TEST_docdb_log_write_batches=true”
I start a shell in a second node:
In this container, I wait to be sure that yb1 is up and start yb2 that joins yb1
yugabyted start –join yb1.yb –tserver_flags=“TEST_docdb_log_write_batches=true”
Here, running on the same host, both containers show the same Physical Time in http://localhost:7000/tablet-server-clocks
I install gcc and compile a fake_clock_gettime.so that overrides clock_gettime, calls the original one, and subtracts 499 milliseconds to its result:
#define _GNU_SOURCE
#include <stdlib.h>
#include <dlfcn.h>
int clock_gettime(clockid_t clk_id, struct timespec *tp)
{
static int skew_millisecond = 499;
static int (*origin_clock_gettime)();
static int ret;
// define the real clock_gettime and call it
if(!origin_clock_gettime) {
origin_clock_gettime = (int (*)()) dlsym(RTLD_NEXT, “clock_gettime”);
}
ret=origin_clock_gettime(clk_id,tp);
// add clock skew and return
if (tp->tv_nsec >= skew_millisecond * 1000000 ) {
tp->tv_nsec -= skew_millisecond * 1000000 ;
} else {
tp->tv_sec -= 1;
tp->tv_nsec += 1000000000 – skew_millisecond * 1000000 ;
}
return(ret);
}
C
dnf install –y gcc
gcc –o fake_clock_gettime.so –fPIC –shared fake_clock_gettime.c –ldl
This library can be loaded with LD_PRELOAD, and I test it by calling date:
21:31:44:015385334
21:31:43:518894559
21:31:44:020271039
[root@yb2 yugabyte]# date +“%T:%N” ; LD_PRELOAD=$PWD/fake_clock_gettime.so date +“%T:%N” ; date +“%T:%N”
21:31:45:955772746
21:31:45:459189786
21:31:45:960576587
The date called with the library shows a lower time.
I restart YugabyteDB on yb2 with this hack:
LD_PRELOAD=$PWD/fake_clock_gettime.so yugabyted start
I can see the clock skew on the Physical Time and Hybrid Time:
I run some workload that involves tablets in both nodes to get some Lamport logical clock synchronization:
With the messaging between the nodes, the Physical Time still shows a clock skew, but the Logical Time is synchronized:
If you are curious, here is more information about clock synchronisation in distributed databases: https://www.yugabyte.com/blog/evolving-clock-sync-for-distributed-databases/