DEV Community

Paul J. Lucas
Paul J. Lucas

Posted on • Edited on

Avoid the Temptation of Header-Only Libraries

#c

Introduction

Occasionally, you come across some library that advertises itself as a header-only library, that is all the code is entirely contained in one or more .h files only — no .c files. The claim is that such libraries are simpler to use because all you have to do is #include the headers. Header-only libraries generally fall into two types:

  1. Those that provide type-generic code, e.g., “containers.”
  2. Those that provide ordinary code.

My claim is that neither case justifies making a library header-only.

Type-Generic Container Header-Only Libraries

One thing that occurs in most programming languages is the desire to have type-generic code, that is code whose purpose or algorithms don’t depend on the type of data. Most commonly, this manifests as the desire to have generic “containers,” e.g., arrays, lists, or sets of some type T where T can not only be any type built into the language, but user-defined types as well, e.g., list of integers, sets of strings, etc.

Different languages that support generic containers do so differently, e.g., C++ uses “templates” where type parameters are instantiated at compile-time with specific types, e.g., a list<T> can be instantiated with T = int yielding list<int>.

C has only minimal support for generic code via _Generic and preprocessor macros. Some people try to use these things to implement generic containers by using very long and elaborate macros, e.g.:

#define LIST(T)                                              \
  struct list_node_##T {                                     \
    struct list_node_##T *next;                              \
    T data;                                                  \
  };                                                         \
                                                             \
  struct list_##T {                                          \
    struct list_node_##T *head, *tail;                       \
  };                                                         \
                                                             \
  static void list_init_##T( struct list_node_##T *list ) {  \
  // ...
Enter fullscreen mode Exit fullscreen mode

That is the type T, a macro parameter, forms part of the names of the structures and functions, e.g., list_int.

The benefit of using macros like this is that the code is header only, that is all the code is in a single .h file. While this works, it has a number of problems.

In the earliest days of C++ (C with Classes), macros were used to implement generic containers. The problem with macros in general is that they don’t obey either scope or type rules, nor work well with tools. Hence the addition of templates to C++.

Problems

Using macros to define generic containers has a number of problems:

  1. All functions must be declared static, otherwise every .o file whose corresponding .c file includes the header will contain definitions for all functions. That would result in the linker complaining about duplicate symbols.

  2. While static declarations solve the duplicate symbols problem, every .o file still contains the definitions for all functions that are all present within the final executable. This results in code bloat increasing the executable’s size, sometimes dramatically, both on disk and in memory.

  3. It’s hard to debug the code in the macros themselves since it expands to be a single line of code.

For example, even if you use only one type for T, say, int, hence LIST(int), but include the header into, say, five .c files, then the final executable will have five copies of all the code.

Note that if and only if all functions are marked inline in addition to static and the functions are trivial enough to actually be inlined by the compiler, then the code bloat problem goes away. However, any useful containers library will invariably have non-trivial functions that can’t be inlined.

Mitigation Tactics

For generic code as described, there’s no standard way (meaning, there’s no compiler-independent way) to solve the code bloat problem.

If you’re using gcc or clang, you can use the weak attribute; if you’re using MSVC, you’re out of luck since no equivalent attribute exists.

If you’re working on code only for a specific system or you’re the only user, then fine: you can use compiler-specific solutions; but if you want your code to be widely cross-platform, then you shouldn’t use compiler-specific solutions.

Ordinary Header-Only Libraries

Ordinary header-only libraries have the code that otherwise would have gone into .c files in the .h files instead, typically guarded by some macro like:

// ... type and function declarations ...

#ifdef FOO_LIBRARY_IMPLEMENTATION
// ... function definitions ...
#endif
Enter fullscreen mode Exit fullscreen mode

Then in one .c file of the user’s choosing:

#define FOO_LIBRARY_IMPLEMENTATION
#include "foo_library.h"
Enter fullscreen mode Exit fullscreen mode

The library author is forcing the user to compile the code into some .c file that the author could have simply provided in the first place. The user has to list the .h file as a dependency in their makefiles anyway; also listing the .c file as a dependency is no more of a burden than choosing an existing .c and defining a macro. Hence, it’s not at all clear to me why such a header-only library is allegedly simpler to use.

Problems

Such header-only libraries have their own problems:

  1. The header file(s) are much bigger which means compile times are longer in large projects since in all but one case, all the code for the implementation is read, parsed, and simply discarded.
  2. For the author, header-only libraries are harder to write and maintain because they preclude using build systems such as either Autotools or CMake to “probe” the host system. After all, the user is only supposed to #include the header — right??

The problem with precluding using build systems is that if a library has any one of compiler-, operating-system-, or CPU-specific code (as any non-trivial library might), you have to handle all the cases manually via #ifdef. For example, I recently came across the following in a header-only library:

#if defined(__LITTLE_ENDIAN__) || defined(__ARMEL__) || \
    defined(__THUMBEL__) || defined(__AARCH64EL__) || \
    defined(_MIPSEL) || defined(__MIPSEL) || defined(__MIPSEL__) || \
    defined(__x86_64__) || defined(__i386__) || defined(__X86__)
    // ...
#else
    // ...
Enter fullscreen mode Exit fullscreen mode

Do you really want to research and maintain something like that? Instead, if you used a build-tool like Autotools and just probe the platform directly by using the AC_CANONICAL_HOST macro and let it do the work. If using CMake, it you get CMAKE_SYSTEM_PROCESSOR for free. In either case, many people have contributed many person-years of work to get those right so you don’t have to.

Conclusion

The hard reality is that C doesn’t really support header-only libraries. For type-generic code, you should implement a library using the generic void* in a conventional .h and .c pair. Since you have to add the .h to your dependencies anyway, also adding the corresponding .c is trivial.

As an alternative to void*, you can use flexible array members to implement generic containers in C, but that’s a story for another time.

For ordinary header-only libraries, just use .c files and build tools.

Epilogue

You might be wondering, “Don’t C++ templates have the same problems?” Only partially. First, since templates are part of the language proper and not the preprocessor, a C++ compiler can mark all instantiated functions as weak in whatever manner a given platform needs — but it’s the compiler’s problem, not your problem.

Second, the library implementation can use factorization tricks. For example, all std::list<T*> can be implemented in terms of std::list<void*>, that is the single instantiation of the latter can be used for all pointers regardless of T. There are other tricks possible as well.

Note that with a lot more work, you could probably do some factorization tricks in C as well, e.g., have PTR_LIST(T) that’s implemented in terms of LIST(void*).

References

  • The Design and Evolution of C++, Bjarne Stroustrup, AT&T Bell Laboratories, Addison-Wesley, Reading, Massachusetts, 1994, §15.1.

Top comments (0)