Making a Totally Free Uptime Monitor using a Worker Runtime and OpenTelemetry

RMAG news

Table of Contents

What is an Uptime Monitor and When to Use One?
Traditional Options

Using a Worker Runtime and OpenTelemetry

The High-Level Solution
The High-Level Setup Steps
Comparison to the Other Options

Takeaway

What is an Uptime Monitor and When to Use One?

An uptime monitor is a tool that periodically (e.g. every minute) checks your application or API to gauge if it’s up and healthy.

If you have true observability and are using SLOs effectively you probably don’t need to use one. But if you’re not at that level yet, an uptime monitor can be a valuable information source regarding the reliability of your application or API.

Traditional Options

There are a number of ways to run an uptime monitor. For example,

Running a cron job on a server/VM and using bash, curl, and webhooks
Setting up an Eventbridge cron with Container/Lambda targets and webhooks
Paying for a 3rd party service (e.g. Pingdom)

Each of them comes with their own downsides though

Maintenance (e.g. security patching, keeping away from end-of-life states)
Complexity (e.g. setting up IaC, CI/CD)
Cost

Is there an option that avoids these downsides?

Using a Worker Runtime and OpenTelemetry

I contend there is using a worker runtime and OpenTelemetry.

The High-Level Solution

The solution maps out at a high-level as follows

Use a cron from a worker runtime
Have the worker hit the application or API endpoint
Gather instrumentation about the network call with OpenTelemetry
Send that OpenTelemetry instrumentation to an observability backend
Use the observability backend to alert on unhealthy traffic

The High-Level Setup Steps

These steps will use Cloudflare Workers for the worker runtime, but something similar can be done with Deno Deploy as well.

Create a free Cloudflare account

Create a worker with the following code and the Node.js compatibility flag

import { instrument } from @microlabs/otel-cf-workers

const handler = {
async scheduled(event, env, ctx) {
await fetch(env.ENDPOINT_TO_MONITOR)
}
}

const config = (env, _trigger) => {
return {
exporter: {
url: https://api.honeycomb.io/v1/traces,
headers: { x-honeycomb-team: env.HONEYCOMB_API_KEY },
},
service: { name: env.ENDPOINT_NAME },
}
}

export default instrument(handler, config)

Add an environment variable named “ENDPOINT_TO_MONITOR” with the endpoint to check and add another environment variable named “ENDPOINT_NAME” with a friendly name for the endpoint

Create a free Honeycomb account

Create an environment named “Uptime Monitors” and create an ingest key

Back in Cloudflare, take that ingest key and copy-paste it into a Cloudflare Workers secret named “HONEYCOMB_API_KEY”

Add a cron of “* * * * *” to the worker

(Confirm that traces are appearing every minute in Honeycomb)

In Honeycomb, create a trigger (alert) based on the query

COUNT > 0 where http.response.status_code >= 400

Route the trigger’s notifications as needed (e.g. to Slack)

You should now have a functioning uptime monitor for your endpoint.

Comparison to the Other Options

Compared to the other options outlined before, this solution has

Minimal maintenance (just a single npm package and its dependencies to monitor for security vulnerabilities)
Minimal complexity (just the steps outlined above)
Totally free (the usage is very much within the Cloudflare Workers free tier and Honeycomb free tier)

Takeaway

Paying for an uptime monitor service is probably preferable to this (if you’re able to).

The real takeaway is that there is this newer form of compute (worker runtimes) with a cost model that can be taken advantage of for situations similar to this.