mocking.rst: add start of write-up on mocking + function redirection
Note: this isn't so much about mocking as it is about redirecting
function calls (potentially to a mock). But kunit.dev/mocking will
probably be more memorable and match more what people search for.
In [1], a doc was presented on stream but couldn't be made publicly
available.
This is the start of the process of making it available.
Unlike that doc, this page has a different audience and is intended to
also try and give a basic feel for what mocking is and why you
would/would not want to use it.
This first patch includes the description of fakes vs mocks and a
skeleton for the rest (the doc's overview of the different kinds of
approaches to adding indirection).
[1] https://events.linuxfoundation.org/mentorship-session-kunit-testing-strategies/
Change-Id: I54899388cc2ca07b1ba5d41dc7add4363cab2223
Signed-off-by: Daniel Latypov <dlatypov@google.com>
diff --git a/index.rst b/index.rst
index 12194d6..2936d33 100644
--- a/index.rst
+++ b/index.rst
@@ -15,6 +15,7 @@
development/index
third_party/kernel/index.rst
third_party/stable_kernel/index.rst
+ mocking
press
What is KUnit?
diff --git a/mocking.rst b/mocking.rst
new file mode 100644
index 0000000..b84bbe1
--- /dev/null
+++ b/mocking.rst
@@ -0,0 +1,243 @@
+====================================
+Fakes and Stubbing and Mocks, Oh My!
+====================================
+
+This page seeks to provide an overview on mocking and a related task:
+redirecting function calls to test-only code. Note: many people use the term
+"mocking" to refer to the latter (and that's fine!), but we'll try and keep the
+concepts separate in this doc.
+
+KUnit currently lacks specific support for either of these, in part due to the
+fact there's enough trade-offs that it's hard to come up with a generic
+solution.
+
+Why do we need this?
+====================
+
+First, let's consider what the goal is. We want unit tests to be as
+lightweight and hermetic as possible, and only test the code we care about.
+
+A canonical example in userspace testing to consider is a database.
+We'd want to verify that our code behaves properly (inserts the right rows to
+the database, etc.), but we don't want to bring up a test database every time
+we run our tests.
+
+Not only will this make the test take longer to run, it also adds more
+opportunities for the test to break in uninteresting ways, e.g. if writes to
+the database fail due to transient network issues.
+
+If we can construct a "fake" database that implements the same interface, which
+is simply an in-memory hashtable or array, then we can have much faster and
+more reliable tests. Unit tests simply don't need the scability and features of
+a real database.
+
+Fakes versus mocks
+==================
+
+We'll be using terminology roughly as defined in
+https://martinfowler.com/bliki/TestDouble.html, namely:
+
+- a "test double" is the more generic term for any kind of test-only replacement.
+- a "mock" is a test double that specifically can make assertions about how its
+ called and can return different values based on its inputs.
+- a "fake" is a test double that mimics the semantics of the code it's replacing
+ but with less overhead and dependencies, e.g. a fake database might just use
+ a hash table, or a fake IO device which is just a ``char buffer[MAX_SIZE]``, or UML itself (in a sense).
+
+| Mocks generally are written with support from their testing framework, whereas fakes are typically written without them.
+| KUnit currently lacks any features to specifically facilitate mocks, so it's recommended to create and use fakes.
+
+Downsides of mocking
+--------------------
+
+Very briefly, using mocks in tests can make tests more fragile since they test
+"behavior" rather than "state."
+
+What do we mean by that? Let's imagine we're testing some userspace program
+with gMock-like syntax (a C++ mocking framework):
+
+.. code-block:: c
+
+ void send_data(struct data_sink *sink)
+ {
+ /* do some fancy calculation to figure out what to write */
+ sink->write("hello, ");
+ sink->write("world");
+ }
+
+ void test_send_data(struct test *test)
+ {
+ struct data_sink *sink = make_mock_datasink();
+
+ EXPECT_CALL(data_sink, write("hello, "))
+ .WillOnce(Return(7));
+ EXPECT_CALL(data_sink, write("world"))
+ .WillOnce(Return(5));
+ send_data(sink);
+ }
+
+And now let's say we've realized we can make our code twice as fast with more
+buffering, effectively changing it to:
+
+.. code-block:: c
+
+ void send_data(struct data_sink *sink)
+ {
+ sink->write("hello, world");
+ }
+
+
+| Oops, now our mock-based tests are failing since we've changed how many times we call ``write()``!
+| Contrast this to a state-based approach where ``write()`` might just append to some ``char buffer[MAX_SIZE]``. In that case, we can validate ``send_data()`` worked by just using ``KUNIT_EXPECT_STREQ(test, buffer, "hello, world")`` and it would work for either implementation.
+
+A further downside is that the test author has to mimic the behavior
+themselves, i.e. the return values for each ``write()`` call. This means if
+the test author makes a mistake or tests just don't get updated after a
+refactor, the mock can behave in unrealistic fashion.
+
+This can and *will* eventually lead to bugs.
+
+
+Upsides of mocking
+------------------
+
+| This is not to say that one should never test "behaviour", i.e. use mocking.
+| E.g. imagine we *wanted* the example test to validate that we only call ``write()`` once since each call is super-expensive.
+| Or consider when there's no easy way to validate that the state has changed, e.g. if we want to validate that ``prefetchw()`` is called to pull a specific data structure into cache.
+
+
+| It's also easier easier to use a mock if we want to force a certain return value, e.g. if we want to make a specific ``write()`` call fail so we can test an error path.
+| With our ``data_sink`` example above, it's hard for an append into a ``char buffer[MAX_SIZE]`` to fail until we hit ``MAX_SIZE``, but for real code that might be writing to disk or sending data over the network, failure could happen for ~any call. And it's valuable to test that our code is robust against such failures.
+
+Function redirection
+====================
+
+| Regardless of what kind of test double you use, they're useless unless you can swap out the real code for them.
+| For lack of a better term, we'll refer to this as function redirection: how do I make calls to ``real_function()`` go to my ``fake_function()``?
+
+| In other test frameworks (Python's unittest, JUnit for Java, Googletest for C++, etc.), this is fairly easy. This is because they rely on techniques like dynamic dispatch, which has language support.
+| We can and do re-implement dynamic dispatch in the kernel in C, but this adds runtime overhead which may or may not be acceptable in all contexts.
+
+The problem boils down to `adding another layer of indirection
+<https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering>`_
+and we have various options to choose from, which we'll describe below.
+
+For each of these, let's consider the following code:
+
+.. code-block:: c
+
+ static void func_under_test(void)
+ {
+ /* unsafe to call this function directly in a test! */
+ send_data_to_hardware("hello, world\n");
+ }
+
+Run time (ops structs, "class mocking")
+---------------------------------------
+
+This is the most straightforward approach and fundamentally boils down to doing
+this:
+
+.. code-block:: c
+
+ static void func_under_test(void (*send_data_func)(const char *str))
+ {
+ send_data_func("hello, world\n");
+ }
+
+
+Being a bit more sophisticated, we can introduce a struct to hold the
+functions:
+
+.. code-block:: c
+
+ struct send_ops {
+ void (*send)(const char *str);
+ /* maybe more functions here in real code */
+ };
+
+TODO(dlatypov@google.com): write about "class mocking", `RFC here
+<https://lore.kernel.org/linux-kselftest/20201012222050.999431-1-dlatypov@google.com/>`_
+
+Pros:
+~~~~~
+
+- Simplest implementation: "it's just code."
+- This is the only approach here where we can limit the scope of the
+ redirection.
+
+ - The subsequent approaches **globally** redirect all calls to
+ ``send_data_to_hardware()``, potentially in code not-under-test we
+ don’t want to mess with.
+- There are plenty of such structs throughout the kernel.
+
+ - And users don't need any special support from KUnit.
+
+Cons:
+~~~~~
+
+- ~Everyone knows about this convention but still want "mocking." It's not seen
+ as sufficient by itself.
+- Requires the most invasive code changes if the code isn't already using this
+ pattern.
+
+ - Introduces runtime overhead (an indirect call, another function
+ argument, etc.)
+- If ``func_under_test()`` is publicly exposed, but ``send_data_func()`` is not
+ (most likely the case), users need to workaround this.
+- The `RFC for "class mocking"
+ <https://lore.kernel.org/linux-kselftest/20201012222050.999431-1-dlatypov@google.com/>`_
+ requires a lot of boilerplate, even after providing macros to take care of
+ most of it.
+
+ - This is fundamentally a limitation of C (as opposed to C++ where
+ classes have language support). It’s unlikely we can improve much
+ here.
+
+Compile time
+------------
+
+TODO(dlatypov@google.com): write me
+
+Pros:
+~~~~~
+
+- TODO
+
+Cons:
+~~~~~
+
+- TODO
+
+Link time (__weak symbols)
+--------------------------
+
+TODO(dlatypov@google.com): write me
+
+Pros:
+~~~~~
+
+- TODO
+
+Cons:
+~~~~~
+
+- TODO
+
+Binary-level (ftrace et. al)
+----------------------------
+
+TODO(dlatypov@google.com): write me
+
+Pros:
+~~~~~
+
+- TODO
+
+Cons:
+~~~~~
+
+- TODO
+
+TODO(dlatypov@google.com): include discussion on global functions/general statefulness.
+TODO(dlatypov@google.com): include section on worked example use cases.