An Island in the C of Unsafety

A couple of days ago one of my mentoring professors was introducing me to the internals of a cryptographic library he started. He mentioned that there were some cool tricks, and showed me a function that copies bytes from one buffer to another. The function was called memTransfer and its signature was Dest dest -> Src src -> IO () (i.e., it has two arguments, namely the source and destination buffers; ignore the return value). There were some additional constraints to make sure dest and src are buffers. But the interesting parts are the types Dest and Src. They make sure you don't accidentally swap the source and destination pointers, which is possible when using say, C's memcpy.

How do they make sure you don't swap the buffers? Because unlike in memcpy, you cannot directly pass the pointers dest and src. Instead, you have to wrap them in Dest and Src. Yes, it could be possible to deliberately wrap the source buffer in Dest and the destination buffer in Src, but the point is, if was an accident, you would notice. And then there is the type system, which would definitely bark at you if you confused the order of the arguments.

My immediate thought was, how do I replicate this in C? Many C libraries alias primitive types using #define or typedef, which helps to upgrade/downgrade the underlying type transparently, but offers no type safety. You could do typedef void * DestPtr and typedef void * SrcPtr, and then define your own safe_memcpy(DestPtr dest, SrcPtr src, size_t siz), which would offer neither of the two safeguards provided by the Haskell code we discussed. For one thing, there are no wrappers in the callsite (you still call something like memcpy(buf2, buf1, somesize) which isn't descriptive). Second, both DestPtr and SrcPtr are just some aliases for void *, which means the compiler won't complain.

The simplest solution? Just make the function name reflect the arguments (like memcpyDestSrc). Won't be so nice in long run.

The question is, can we exploit the limited type safety offered by C to accomplish something higher like in the Haskell example? Easy. Here's the solution that came to my mind during the discussion:

// Nandakumar Edamana
// 2024-02-23
//
// The code conforms to C99 or newer (tested using gcc --std=c99).
// Only minor tweaks are needed to make it C89-compatible (not elaborating
// since nobody needs it).

#include <string.h>

typedef struct Src {
	const void *ptr;
} Src;

typedef struct Dest {
	void *ptr;
} Dest;

// Forces to crosscheck source and destination, still no protection against
// invalid size. Maybe create a wrapper DestSiz so that at least the intention
// is clear.
static inline void * safe_memcpy(Dest dest, Src src, size_t siz)
{
	return memcpy(dest.ptr, src.ptr, siz);
}

// Makes sure direct use of memcpy() is not allowed.
#define memcpy NOT_ALLOWED

int main()
{
	int arr1[16], arr2[16];

	safe_memcpy((Dest){ arr2 }, (Src){ arr1 }, sizeof(arr2));

	// "in expansion of macro ‘memcpy’ ... undefined reference to `NOT_ALLOWED'"
	//memcpy(arr2, arr1, sizeof(arr2));

	return 0;
}

Okay, what does the above code do? Almost the exact things offered by the Haskell example:

  • Make the compiler prevent you from accidentally reordeing dest and src
  • Force you to cross-check the source and destination because you have to explicitly wrap them
  • Have zero runtime overhead for this added safety

Additionally, it prevents you from using memcpy directly thanks to the preprocessor (which is ironically the most unsafe thing about C). The only lacking thing is size-safety, implementing which wouldn't be as straightforward as this.

Does this mean C is as type-safe as Haskell? Of course not! Just a reminder that there could be hacks that increase the safety of things that are usually very unsafe. Also, I'm pretty sure somebody else is doing this. There could even be a name for it! Just wrote this trick down because I don't remember having read this anywhere.

NOTE: the const qualifier for the source buffer in memcpy may trick us into thinking that we already have the kind of safety discussed in this article. But it doesn't prevent you from accidentally swapping the pointers if both of your buffers are originally non-const (like in the C code above), which is usually the case.


Tags: programming, haskell, c, type-safety

Read more from Nandakumar at nandakumar.org/blog/