An Island in the C of Unsafety
2024-02-24
A couple of days ago one of my mentoring professors was introducing me to the
internals of a cryptographic
library he started. He mentioned that there were some cool tricks, and
showed me a function that copies bytes from one buffer to another. The function
was called memTransfer
and its signature was
Dest dest -> Src src -> IO ()
(i.e., it has two arguments, namely
the source and destination buffers; ignore the return value). There were some
additional constraints to make sure dest
and src
are
buffers. But the interesting parts are the types Dest
and
Src
. They make sure you don't accidentally swap the source and
destination pointers, which is possible when using say, C's
memcpy
.
How do they make sure you don't swap the buffers? Because unlike in
memcpy
, you cannot directly pass the pointers dest
and src
. Instead, you have to wrap them in Dest
and
Src
. Yes, it could be possible to deliberately wrap the source
buffer in Dest
and the destination buffer in Src
, but
the point is, if was an accident, you would notice. And then there is the type
system, which would definitely bark at you if you confused the order of the
arguments.
My immediate thought was, how do I replicate this in C? Many C libraries alias
primitive types using #define
or typedef
, which
helps to upgrade/downgrade the underlying type transparently, but offers no
type safety. You could do typedef void * DestPtr
and
typedef void * SrcPtr
, and then define your own
safe_memcpy(DestPtr dest, SrcPtr src, size_t siz)
, which would
offer neither of the two safeguards provided by the Haskell code we discussed.
For one thing, there are no wrappers in the callsite (you still call something
like memcpy(buf2, buf1, somesize)
which isn't descriptive).
Second, both DestPtr
and SrcPtr
are just some aliases
for void *
, which means the compiler won't complain.
The simplest solution? Just make the function name reflect the arguments (like
memcpyDestSrc
). Won't be so nice in long run.
The question is, can we exploit the limited type safety offered by C to accomplish something higher like in the Haskell example? Easy. Here's the solution that came to my mind during the discussion:
// Nandakumar Edamana // 2024-02-23 // // The code conforms to C99 or newer (tested using gcc --std=c99). // Only minor tweaks are needed to make it C89-compatible (not elaborating // since nobody needs it). #include <string.h> typedef struct Src { const void *ptr; } Src; typedef struct Dest { void *ptr; } Dest; // Forces to crosscheck source and destination, still no protection against // invalid size. Maybe create a wrapper DestSiz so that at least the intention // is clear. static inline void * safe_memcpy(Dest dest, Src src, size_t siz) { return memcpy(dest.ptr, src.ptr, siz); } // Makes sure direct use of memcpy() is not allowed. #define memcpy NOT_ALLOWED int main() { int arr1[16], arr2[16]; safe_memcpy((Dest){ arr2 }, (Src){ arr1 }, sizeof(arr2)); // "in expansion of macro ‘memcpy’ ... undefined reference to `NOT_ALLOWED'" //memcpy(arr2, arr1, sizeof(arr2)); return 0; }
Okay, what does the above code do? Almost the exact things offered by the Haskell example:
- Make the compiler prevent you from accidentally reordeing
dest
andsrc
- Force you to cross-check the source and destination because you have to explicitly wrap them
- Have zero runtime overhead for this added safety
Additionally, it prevents you from using memcpy
directly thanks to
the preprocessor (which is ironically the most unsafe thing about C). The only
lacking thing is size-safety, implementing which wouldn't be as straightforward
as this.
Does this mean C is as type-safe as Haskell? Of course not! Just a reminder that there could be hacks that increase the safety of things that are usually very unsafe. Also, I'm pretty sure somebody else is doing this. There could even be a name for it! Just wrote this trick down because I don't remember having read this anywhere.
NOTE: the const
qualifier for the source buffer in
memcpy
may trick us into thinking that we already have the kind of
safety discussed in this article. But it doesn't prevent you from accidentally
swapping the pointers if both of your buffers are originally non-const (like in
the C code above), which is usually the case.
Nandakumar Edamana
Tags: programming, haskell, c, type-safety
Read more from Nandakumar at nandakumar.org/blog/