C Self-Reflection OR When the Good Old DWARF Makes Your Elves Face Their Unconscious Truth

RMAG news

The article is dedicated to the ability of major compilers like gcc or clang to be a source for reflection information for C applications, which makes possible C reflection implementation like Metac. This works for elf, macho and pe formats on the corresponding platforms Linux, macOS and Windows.

Traditionally, C hasn’t embraced reflection capabilities like some other programming languages. This is because C prioritizes efficiency and control. However, the lack of reflection doesn’t necessarily mean a lack of introspection capabilities. Debuggers, for instance, rely heavily on debug information embedded within executable files. This information goes even beyond what reflection needs, encompassing details like types defined in the code, line numbers, source code references, symbol information and even location of variables and function arguments on the stack. One of the most common debug information formats is called DWARF and it’s a native for ELF format way to expose all needed for debuggers data. Even better, this format also works for Mach-O (MacOs executable format) and PE (Windows executable format).

This begs the question: can’t applications utilize this data directly to gain self-awareness? Here’s why it’s not as straightforward as it seems:

DWARF has too much information – reflection requires just a subset.
Shipping executables with full debug information can be undesirable due to increased size and potential security concerns.
Utilities like strip can remove debug information, rendering the application unusable.
It may be worth separating reflection information which may be only for part of the application from debug information.

This implies a need for a specialized tool that can efficiently read, filter, and convert DWARF data (or its equivalent in other formats) into a format usable by the application. Enter Metac.
Metac leverages this existing DWARF (or equivalent) information within the executable format to provide a targeted form of reflection for C code. This allows applications to access a relevant subset of the data, promoting introspection without the drawbacks of full debug information. Metac bridges the gap and allows C programs to query that data at runtime, enabling them to extract information about their own types, variables, and functions. This newfound self-awareness empowers C programs for more efficient debugging, dynamic behavior, and potential future functionalities.

While DWARF data is available for object files on other platforms, macOS presents a slight hurdle. On macOS, DWARF information is typically not generated by default and requires the dsymutil tool to create it explicitly for linked executable binary ONLY.

To ensure consistent behavior across platforms, Metac takes a two-step approach that will work on macOS and compatible with other platforms:

Build with DWARF Generation: The application is first built with special flags (-g3 -D_METAC_OFF_) that enable DWARF generation but disable Metac functionalities during this stage.

Extract and Integrate DWARF Data: After the initial build, the DWARF information is extracted from the executable. Then, an additional C file containing reflection information is generated based on this data. Finally, the application is rebuilt with this additional file to include the necessary reflection capabilities and with Metac functionalities enabled.

This multi-step process might seem complex, but it’s automated within a Makefile, simplifying the workflow for developers. It’s important to remember that Metac is getting DWARF from the complete built and linked application even though this process can be changed for other platforms.

Let’s delve into a practical example. Imagine a C program that manages a complex data structure, like a linked list. Traditionally, debugging any issues within the structure requires manual code inspection. However, with Metac, the program can introspect its own linked list, examining elements like pointers and values. This allows for targeted debugging and manipulation of the list at runtime.

Here’s a simplified code example demonstrating how Metac could be used to examine a variable of type struct test:

// main.c
#include <stdio.h> // printf
#include
<stdlib.h> // free
#include
<math.h> // M_PI, M_E
#include
“metac/reflect.h”

struct test {
int y;
char c;
double pi;
double e;
short _uninitialized_field;
};

int main(){
// we need to use this construction to wrap variable declaration
// to get its type information
WITH_METAC_DECLLOC(decl_location,
struct test t = {
.y = 10,
.c = ‘a’,
.pi = M_PI,
.e = M_E,
};
)
metac_value_t *p_val = METAC_VALUE_FROM_DECLLOC(decl_location, t);

char * s;
s = metac_entry_cdecl(metac_value_entry(p_val));
// next will output “struct test t = “
printf(“%s = “, s);
free(s);

s = metac_value_string(p_val);
// next will output “{.y = -10, .c = ‘a’, .pi = 3.141593, .e = 2.718282, ._uninitialized_field = 0,};n”
printf(“%s;n, s);
free(s);

metac_value_delete(p_val);

return 0;
}

Explanation:

We include the metac/reflect.h header for using Metac functions.
We define a structure called test with various member variables.
In main, we create a variable t of type struct test and initialize its members. Note: the construction WITH_METAC_DECLLOC just makes sure that the arbitrary C-code from the second argument is located on the same line with the declaration location variable decl_location.
We use METAC_VALUE_FROM_DECLLOC(decl_location, t) to get a metac_value_t representing the value of t.

metac_entry_cdecl(metac_value_entry(p_val)) retrieves a C-style string representing the declaration of t (e.g., struct test t).

metac_value_string(p_val) retrieves a string representing the actual value of t with its member values (e.g., {y=-10, c=’a’, pi=3.141593, e=2.718282, _uninitialized_field=0}).
We free the allocated memory for both strings using free.
Finally, we call metac_value_delete(p_val) to clean up resources used by Metac.

This example demonstrates how Metac can be used to:

Extract type information about a variable at runtime.
Retrieve the actual value of the variable and its members.

This is just a basic example, but it showcases the power of Metac for C code introspection.

In order to build that it will be necessary to have Metac on the host where the build process is happening.

Here is KBUILD-like Makefile to build the example:

ifeq ($(M),)
METAC_ROOT=../..

all: test target

target:
$(MAKE) -C $(METAC_ROOT) M=$(PWD) target

clean:
$(MAKE) -C $(METAC_ROOT) M=$(PWD) clean

test:
$(MAKE) -C $(METAC_ROOT) M=$(PWD) test

.PHONY: all clean test
endif

rules+=
target
_meta_c_app
c_app.reflect.c
c_app

LDFLAGS-c_app=-Lsrc -lmetac
LDFLAGS-_meta_c_app=-Lsrc -lmetac

in_c_app+=main.o

TPL-_meta_c_app:=bin_target
IN-_meta_c_app=$(in_c_app:.o=.meta.o)
POST-_meta_c_app=$(METAC_POST_META)

TPL-c_app:=bin_target
IN-c_app=$(in_c_app) c_app.reflect.o
DEPS-c_app=src/libmetac.a

TPL-c_app.reflect.c:=metac_target
METACFLAGS-c_app.reflect.c+=run metac-reflect-gen $(METAC_OVERRIDE_IN_TYPE)
IN-c_app.reflect.c=_meta_c_app

TPL-target:=phony_target
IN-target:=c_app

Explanation:

The part from ifeq to endif works exactly like in KBUILD. It’s possible to run make all METAC_ROOT=<path to the Metac root> in order to build this example.
The rest of the file is used to define rules which are going to be generated to build the example using multi-step process described in the beginning. The variable rules list those:

Rule target:

TPL-target:=phony_target
IN-target:=c_app

which is generated as .PHONY and that requires the final target c_app to be built using the corresponding rules.

Rule c_app:

TPL-c_app:=bin_target
IN-c_app=$(in_c_app) c_app.reflect.o
DEPS-c_app=src/libmetac.a

is a rule to build executable binary out of main.o and c_app.reflect.o and libmetac.a. Make knows how to build main.o automatically from main.c. Where do we get c_app.reflect.o from? From c_app.reflect.c:

Rule c_app.reflect.c:

TPL-c_app.reflect.c:=metac_target
METACFLAGS-c_app.reflect.c+=run metac-reflect-gen $(METAC_OVERRIDE_IN_TYPE)
IN-c_app.reflect.c=_meta_c_app

is a rule which employs a metac tool with arguments run metac-reflect-gen $(METAC_OVERRIDE_IN_TYPE). Input file is _meta_c_app. This combination will instruct metac to read DWARF data from _meta_c_app. Parameter METAC_OVERRIDE_IN_TYPE is used to specify if metac must expect elf, macho or pe as input. metac-reflect-gen is a go-template module name which generates c_app.reflect.c.

Rule _meta_c_app:

TPL-_meta_c_app:=bin_target
IN-_meta_c_app=$(in_c_app:.o=.meta.o)
POST-_meta_c_app=$(METAC_POST_META)

Similar to c_app, but it uses main.meta.o as source. The only difference between main.meta.o and main.o is that the first was built with flags -g3 -D_METAC_OFF_.

If we run make on macOS we should see:

% make
/Library/Developer/CommandLineTools/usr/bin/make -C ../.. M=/Users/user/Workspace/metac/examples/c_app_simplest test
/Library/Developer/CommandLineTools/usr/bin/make -C ../.. M=/Users/user/Workspace/metac/examples/c_app_simplest target

cc -I./include -c -MMD -MF /Users/user/Workspace/metac/examples/c_app_simplest/main.d -MP -MT ‘/Users/user/Workspace/metac/examples/c_app_simplest/main.o /Users/user/Workspace/metac/examples/c_app_simplest/main.d’ -o /Users/user/Workspace/metac/examples/c_app_simplest/main.o /Users/user/Workspace/metac/examples/c_app_simplest/main.c

cc -I./include -g3 -D_METAC_OFF_ -c -MMD -MF /Users/user/Workspace/metac/examples/c_app_simplest/main.meta.d -MP -MT ‘/Users/user/Workspace/metac/examples/c_app_simplest/main.meta.o /Users/user/Workspace/metac/examples/c_app_simplest/main.meta.d’ -o /Users/user/Workspace/metac/examples/c_app_simplest/main.meta.o /Users/user/Workspace/metac/examples/c_app_simplest/main.c

cc /Users/user/Workspace/metac/examples/c_app_simplest/main.meta.o -Lsrc -lmetac -o /Users/user/Workspace/metac/examples/c_app_simplest/_meta_c_app

(which dsymutil) && dsymutil /Users/user/Workspace/metac/examples/c_app_simplest/_meta_c_app || echo “Couldn’t find dsymutil”
/usr/bin/dsymutil

./metac run metac-reflect-gen -s ‘path_type: “macho”‘ -s ‘path: “/Users/user/Workspace/metac/examples/c_app_simplest/_meta_c_app”‘ > /Users/user/Workspace/metac/examples/c_app_simplest/c_app.reflect.c

cc -I./include -c -MMD -MF /Users/user/Workspace/metac/examples/c_app_simplest/c_app.reflect.d -MP -MT ‘/Users/user/Workspace/metac/examples/c_app_simplest/c_app.reflect.o /Users/user/Workspace/metac/examples/c_app_simplest/c_app.reflect.d’ -o /Users/user/Workspace/metac/examples/c_app_simplest/c_app.reflect.o /Users/user/Workspace/metac/examples/c_app_simplest/c_app.reflect.c

cc /Users/user/Workspace/metac/examples/c_app_simplest/main.o /Users/user/Workspace/metac/examples/c_app_simplest/c_app.reflect.o -Lsrc -lmetac -o /Users/user/Workspace/metac/examples/c_app_simplest/c_app

Now if we run the application we’ll see:

% ./c_app
struct test t = {.y = -10, .c = ‘a’, .pi = 3.141593, .e = 2.718282, ._uninitialized_field = 0,};

The example can be found here.
More information on how to use metac can be found here

Conclusion:
Metac isn’t just a tool; it’s a path to self-improvement for your C code. With DWARF’s insights and metac’s interpretation, your programs can shed light on their “unconscious” behaviors and unlock their full potential.