🪤 Error Handling in libhal

libhal utilizes C++ exception handling for transmitting errors. C++ exceptions were chosen over other error handling mechanisms because they:

Improve code performance by separating error handling code from normal code, thus enhancing the performance of the normal code by reducing the cost of calling functions that could fail.
Make error handling easier by allowing the user to wrap multiple blocks of code within a handler distinguished by the type/category.
Reduce the binary size of libraries and applications by:
Using a single algorithm to allocate, construct, and transport errors and direct the CPU to the appropriate error handling code.
Eliminating the need for functions to contain error return paths when participating in error propagation.
Providing an error path using unwind instructions, a compressed form of machine instructions that simulate the epilog of a function, but without the requirement to return objects on the stack.
Although handler code can increase the code size compared to plain code (if/else/switch), the number of error handling blocks (catch blocks) is typically much smaller compared to the cost of a distributed error handling approach (result<T, E>, returning error codes, optional/nil/null).
Offer additional space in which they could be significantly improved upon beyond their current performance.

With that out of the way, let's delve into how libhal manages errors.

How to use exceptions in C++

Let's start with signaling an error. This can be done by writing the following bit of code:

void check_if_device_is_valid(/* ... */) {
  constexpr hal::byte expected_id = 0xAD;

  // Get ID info from device ...

  if (expected_id != retrieved_id) {
     throw hal::no_such_device(this, expected_id);
  }
}

And to catch the thrown error you do this following:

void bar() {
  try {
    check_if_device_is_valid(/* ... */);
  } catch(const hal::no_such_device& p_error) {
    // do something using the error info.
  }
}

Note that this is a simplified example.

The throw keyword functions similarly to other languages, where you can throw or raise an error object. This exits the function's scope without returning normally. This action causes the system to revert the CPU's state back to the state of the try scope. The exception mechanism then moves the CPU's program counter to the correct catch block based on the thrown type. In this case, since we threw hal::no_such_device, the catch block for that type will be selected. If no catch blocks are present with a valid error type in any scope from which the error object was thrown, then std::terminate() is called.

Everything within the scope of the try block is no longer valid memory. The significance of this is that the exception unwinding mechanism can and must skip spending cycles on constructing and bubbling objects from a lower stack frame to a higher one. Since the thrown object is the only thing that escapes the scope, any information needed for error handling should be copied to the thrown object as it is being thrown.

`hal::exception` hierarchy

libhal has a hierarchy of errors, which looks like the following:

hal::exception
├── hal::no_such_device
│   └── hal::stm32f1::i2c_core_dump_io_error
├── hal::io_error
│   └── hal::lpc40::i2c_core_dump_io_error
├── hal::timed_out
├── ...
└── hal::unknown

hal::exception is the base exception for all libhal exceptions and is typically not thrown directly. Its descendants are thrown instead, most having a 1-to-1 correspondence with the enumerated constants in std::errc. std::errc follows the POSIX error codes, providing a reasonable approximation of the types of errors hardware might encounter. An exception to this rule is hal::unknown, which represents an unknown error, used when the exact error is undetermined. Such cases should be rare in code.

To see the full list of exception types available, refer to the error API docs. It is important to consult this documentation to understand which exceptions should be thrown and under what circumstances they can be recovered from.

Expectation from libhal libraries

libhal libraries and utilities are required to only use only the direct descendants of hal::exception or a more derived exception with additional information.

Exceptions outside of the hal::exception hierarchy may still be thrown from a libhal library if it comes from a call to a user defined callback. The user is allowed to throw any types they wish, although care should be taken in choosing the types to be thrown. This is useful for application code that wants to bypass catch blocks provided by libhal libraries.

How Do You Know What Throws What?

C++ does not currently have a mechanism to inform the user at compile time if an uncaught exception will terminate your application. Therefore, to know what may be thrown from a function, you'll need to consult the API documentation for the function. All libhal interfaces have strict requirements for their implementations to throw very specific hal::exception derived types.

Knowing when to catch an error

First and foremost, accept that your application may encounter an exception that will terminate it. Plan with this possibility in mind. Use hal::set_exception to set the terminate handler function as needed for your application, such as saving state information and resetting the device.

With this in mind, ONLY catch the errors you know how to handle. If you do not know how to handle an error, allow it to propagate to higher levels in the call chain. This gives higher-level code the opportunity to handle errors.

Do not encase each function in a try/catch block, as this is detrimental to code size and degrades the performance of the unwind mechanism by providing it more scopes to search through.

When to catch `hal::exception`

hal::exception should only be caught when code wants to swallow all possible exceptions from libhal OR when translating exceptions from C++ to a C API that needs an error code that roughly follows std::errc.

int c_callback() {
  try {
    foo();
    bar();
    baz();
  } catch (const hal::exception& p_error) {
    return static_cast<int>(p_error.error_code());
  }
}

Using `hal::exception::instance()`

try {
  read_timeout();
  bandwidth_timeout();
} catch (const hal::timed_out& p_exception) {
  if (&read_timeout == p_exception.instance()) {
    hal::print(console, "X");
    read_complete = true;
  }
  // TODO: Replace this exceptional bandwidth timeout with a variant that
  // simply returns if the timeout has occurred. This is not its intended
  // purpose but does demonstrates proper usage of `p_exception.instance()`.
  else if (&bandwidth_timeout == p_exception.instance()) {
    hal::print(console, "\n   +  |");
    bandwidth_timeout = hal::create_timeout(counter, graph_cutoff);
  } else {
    throw;
  }
}

In this case, read_timeout and bandwidth_timeout are callable objects that live in a scope above the try block allowing them to be modified and updated in the error handling block. Because both of these objects can throw an exception, we may want to know which one throw the exception. We can use the instance() function to get the address of the object that threw an exception. If the instance does not match anything in scope, then it may have been from an object that was lower in the stack and is no longer valid.

Note the comment or bandwidth_timeout. bandwidth_timeout is apart of the normal control flow and should not be reporting errors to move along the normal control flow. read_timeout on the other hand does report an actual error in this context. This example is taken from libhal-esp8266/demos/applications/at_benchmark.cpp.

Caution

DO NOT USE const_cast and reinterpret_cast to FORCE an address from instance() into a pointer to some other type and then attempt to use it. This is strong undefined behavior. ONLY use the address returned from instance as a means to compare it to other objects.

Why you shouldn't throw an `int` or other primitives

Application callbacks are allowed to throw whatever type they wish although care should be taken to consider a good type to throw.

Throwing int is generally a bad choice because it gives little to no information about what the kind of error is. And if such a choice was used, it probably means that the int encodes an error code, meaning many sections of code would need to catch it, check if its their error code, and rethrow it, if it is not the correct error code. This resulting in a large number of catch blocks.

🪤 Error Handling in libhal

How to use exceptions in C++

hal::exception hierarchy

Expectation from libhal libraries

How Do You Know What Throws What?

Knowing when to catch an error

When to catch hal::exception

Using hal::exception::instance()

Why you shouldn't throw an int or other primitives

`hal::exception` hierarchy

When to catch `hal::exception`

Using `hal::exception::instance()`

Why you shouldn't throw an `int` or other primitives