Linear algebra, bounds checking and debugging#
Today’s assignment
Bounds checking#
A very common error is to access arrays beyond their actual bounds. In C, if an array is declared as
double arr[5];
the allowed indices to access it are 0…4, as there are 5 elements, and C/C++
start counting at 0. Accessing arr with an index less than 0 or greater or
equal than 5 is what the C language spec calls “undefined behavior”, which means
it may sometimes work, other times crash the program, or, even more annoyingly,
lead to subtle errors in totally different part of your program later on. So you
definitely want to avoid this. Fortran compilers, which are subject to the same
problem, may offer a -fbounds-check or similar option, but C generally does
not, since in C an array is in most places really just a pointer to its first
(0th, really) element, and there’s no way the compiler can know how many
elements follow. There are sophisticated tools, though, which can be helpful in
tracking down illegal accesses, though they’re not always available or easy to
use (see the references at the bottom).
If you use your own vector or matrix struct, which actually stores not just the raw data, but also the length of the vector / size of the matrix, you can make use of this to implement your own bounds checking. This is demonstrated here, where bounds checking code is added, though not turned on. In general, the additional checks will take time, so they’re useful during development, but they probably should be turned off in the finished application when you want it to run as fast as possible.
Your turn#
My work will be in the c/ subdirectory. However, as usual you (or VS Code)
should just make a top-level build/ directory and call cmake -S .. from
there. The executables will end up in build/c/test_....
Sign up for the assignment repo
Clone onto your local machine
You can follow what I show in class by going back in history. For example: Let’s
go back to an older version of the code main~7. What this means is, check out
the version of the code on the main branch, but not the current one, rather go
back 7 commits in history. Caveat: If you add additional commits in your repository, the numbering will change.
[To avoid this ambiguousness, commits have a unique id (hash),
which gets you to a specific commit for sure. But if one wants to go back and
forth, the branch~<n> is more convenient.]
➜ build git:(main) git checkout main~7
Note: switching to 'main~7'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at e9d3401 class12: print out vector_add result
The warning is telling you is that since you’re not on the tip of a branch anymore, bad things may happen if you make a commit there. So let’s just not do that ;)
If you’re using VS code, or another git GUI, there’s probably a way to check out
an old version of the code from that GUI. E.g., in VS code, you can look at the
history in the Version Control panel, and you can right-click on a given commit
and select Checkout (Detached) from the context menu.
What happened in that commit is that we were presumably debugging something with the vector_add() function and we just added some printing:
commit e4ae3156527b662556df1539f6d5f1853e8a0a7a (HEAD)
Author: Kai Germaschewski <kai.germaschewski@gmail.com>
Date: Tue Feb 24 14:38:53 2026 +0000
class11: print out vector_add result
Let's say we're debugging something...
diff --git a/c/test_vector_add.c b/c/test_vector_add.c
index 9b2b29e..b8ce889 100644
--- a/c/test_vector_add.c
+++ b/c/test_vector_add.c
@@ -2,6 +2,7 @@
#include "linalg.h"
#include <assert.h>
+#include <stdio.h>
#include <stdlib.h>
void test_vector_add(int n)
@@ -20,6 +21,12 @@ void test_vector_add(int n)
vector_add(&x, &y, &z);
+ printf("{");
+ for (int i = 0; i < 3; i++) {
+ printf(" %g", VEC(&z, i));
+ }
+ printf(" }\n");
+
assert(vector_equals(&z, &z_ref));
vector_destruct(&x);
Your turn#
Run the
test_vector_add. What output do you get? Is it what you want? (Remember, we’re trying to manually debug whethervector_add()gives the right result.)
That debug printing seems to work fine, except it just shows 3 elements, Now,
say, you want to actually print all 4 elements of the vector in the 2nd test.
main~6 makes the following
change:
commit 60026088cb5ace32118bba47dbed2b7610aea62d (HEAD)
Author: Kai Germaschewski <kai.germaschewski@gmail.com>
Date: Tue Feb 24 14:40:05 2026 +0000
class11: print 4 elements to debug 2nd test
Nothing's actually wrong in vector_add(), actually, but let's presume we're
debugging anyway...
diff --git a/c/test_vector_add.c b/c/test_vector_add.c
index b8ce889..1f7fa1a 100644
--- a/c/test_vector_add.c
+++ b/c/test_vector_add.c
@@ -22,7 +22,7 @@ void test_vector_add(int n)
vector_add(&x, &y, &z);
printf("{");
- for (int i = 0; i < 3; i++) {
+ for (int i = 0; i < 4; i++) {
printf(" %g", VEC(&z, i));
}
printf(" }\n");
It may not be so obvious, but we just introduced a bug – that test is called once for 3 element vectors, and then again for 4 element vectors. If we always print 4 elements, that’s not so good.
Your turn#
Build the code and run it. Does it work or crash, or what?
Turning on bounds checking#
Alright, so it doesn’t crash (probably), doesn’t fail, but doesn’t quite look right, either. Actually, these are about the worst kinds of bugs, since they lead to random problems (or non at all, seemingly), making them really hard to track down most of the time. So let’s turn on bounds checking.
Your turn#
Uncomment the
#define BOUNDS_CHECKinlinear_algebra.h. Build and run the test.
What went wrong? Well, the bounds checking should point you fairly closely to where to look for the bug.
After you’re done, let’s go back to the main branch. If you have made local
changes, you likely need to undo them first, since git doesn’t want to switch
branches / commits otherwise, as you’d lose those changes, and git tries to
prevent you from unintentionally losing your work. However, today we are okay
with losing this temporary changes. In VS Code’s version management menu, it
lists the files which are changed, and it has a “revert” button next to the
“stage” button. If you’re using another git GUI, it probably also has an option
to revert local changes, or $ git checkout <source-dir> will do it on the
command line.
To go back to your (current) branch, do git checkout main (or
git switch main), or use your GUI / VS Code to do so.
More debugging#
Your branch will be the starting point for the main debugging task of this class.
Well, we first need some real bugs. And to make them not too easy to find, they’re in somewhat complex code, that is, the current version of the linear algebra library, which allows for variable dimensions of matrices and vectors.
For this class, I added a new feature – matrix-matrix multiplication. But we
first have to take a step back again. Checkout the version where I added those,
which is 2 commits back in the main branch (for me, anyway):
vscode ➜ /workspaces/class-8/build (6002608) $ git checkout main~2
Previous HEAD position was 6002608 class11: print 4 elements to debug 2nd test
HEAD is now at 7567948 class11: add matrix-matrix multiply and test
Your turn#
Run the tests. They should check out fine.
Tests#
➜ build git:(4c11f17) cmake --build . && ctest
[...]
Test project /workspaces/class-8/build
Start 1: test_vector_dot
1/7 Test #1: test_vector_dot .................. Passed 0.00 sec
Start 2: test_vector_add
2/7 Test #2: test_vector_add .................. Passed 0.00 sec
Start 3: test_matrix_vector_mul
3/7 Test #3: test_matrix_vector_mul ........... Passed 0.00 sec
Start 4: test_matrix_matrix_mul
4/7 Test #4: test_matrix_matrix_mul ........... Passed 0.00 sec
Start 5: test_vector_dot_cxx
5/7 Test #5: test_vector_dot_cxx .............. Passed 0.00 sec
Start 6: test_vector_add_cxx
6/7 Test #6: test_vector_add_cxx .............. Passed 0.00 sec
Start 7: test_matrix_vector_mul_cxx
7/7 Test #7: test_matrix_vector_mul_cxx ....... Passed 0.00 sec
100% tests passed, 0 tests failed out of 7
Total Test time (real) = 0.02 sec
Excellent, everything looks good.
[Side note: Look at test_matrix_matrix_mul.c. What does it actually test? But
never mind.]
However, after the next change, it turns out that the code now crashes.
➜ build git:(4c11f17) git checkout main~1
Previous HEAD position was 7567948 class11: add matrix-matrix multiply and test
HEAD is now at 2afb480 class11: make the mat-mat-mul test bigger
vscode ➜ /workspaces/class-8/build (2afb480) $ cmake --build . && ctest
[...]
Test project /workspaces/class-8/build
Start 1: test_vector_dot
1/7 Test #1: test_vector_dot .................. Passed 0.00 sec
Start 2: test_vector_add
2/7 Test #2: test_vector_add .................. Passed 0.00 sec
Start 3: test_matrix_vector_mul
3/7 Test #3: test_matrix_vector_mul ........... Passed 0.00 sec
Start 4: test_matrix_matrix_mul
4/7 Test #4: test_matrix_matrix_mul ...........***Exception: SegFault 0.00 sec
Start 5: test_vector_dot_cxx
5/7 Test #5: test_vector_dot_cxx .............. Passed 0.00 sec
Start 6: test_vector_add_cxx
6/7 Test #6: test_vector_add_cxx .............. Passed 0.00 sec
Start 7: test_matrix_vector_mul_cxx
7/7 Test #7: test_matrix_vector_mul_cxx ....... Passed 0.00 sec
86% tests passed, 1 tests failed out of 7
Total Test time (real) = 0.02 sec
The following tests FAILED:
4 - test_matrix_matrix_mul (SEGFAULT)
Errors while running CTest
Output from these tests are in: /workspaces/class-8/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
[I’m hoping it’ll crash for you as well. I tried, and it does crash on my Mac, and it does crash on Linux. But I know that in the past, it wasn’t that easy for me to provide a buggy code that would fail for everyone. If it doesn’t crash for you at all, don’t worry too much – it’s still broken :smirk:]
Narrowing down versions where things break#
Ideally, one should run the tests every time before checking in a new version,
so this should never happen. In reality, however, it does happen quite a bit
that things get broken, and it may take a while to notice (many bugs are more
subtle and even running a bunch of tests is by far no guarantee that everthing
is bug-free). In general, one good way to debug a newly found problem is to
figure out at which version it broke, that is, find the version n such that
things work fine in version n-1, but the problem occurs in version n.
Note: Git has a feature called git bisect that’s useful in finding the first
bad commit more quickly, that might be useful to look into (in general, not for
this class, since we already figured out where it breaks.)
Inspecting what has changed inbetween often helps a lot in narrowing down what caused the bug. However, I made it not so easy here, since the change is that we’re testing a non-square matrix, which previously wasn’t tested at all, so there’s not really an old version which worked.
So here’s the situation: when running test_matrix_matrix_mul, it “segfaults”.
Though google is often a successful strategy for how to fix a problem, googling
“segmentation fault” is likely of little help here (other than maybe some
consolation that one definitely isn’t alone with this kind of problem), since
segmentation faults can be caused by all kinds of bugs.
Your turn#
Make sure that you can reproduce the problem yourself, and that it starts at one specific commit.
What can we do about the segfault?#
Staring at the code#
It can be quite effective to just read the code and look for a bug, in particular when one knows what to stare at, e.g., the changes between a working and a non-working version. If one has no idea where the problem is, however, there’s probably a lot of code to stare at, and it’s not so likely that’ll lead to success anytime soon.
printf()#
One of the most basic, but still often effective ways is "printf()" debugging,
to figure out where exactly the problem happened. The idea is to add a bunch of
print statements, and well, the ones that actually print something when you run
the code clearly indicate that the code hasn’t yet crashed at that point. On the
other hand, if a later print statement does not actually print anything when
running the code, that’s a sign that the code didn’t get to it, since it crashes
beforehand.
The HERE macro introduced in the last commit makes this job a bit easier, as
it saves typing. It’s easy enough to figure out that the problem happens in
matrix_matrix_mul(), but we don’t easily get much further using this method.
assert()#
The assert macro (see man assert), which we’ve encountered before, is
another useful tool if used in the right places. For example, some bugs could
happen if we call matrix matrix multiplication with the matrix dimensions not
matching. This is actually missing in the code right now, and something for you
to add later. But, it is not the culprit, unfortunately. The code calculates C =
AB, where A is a m x k matrix, B is a k x n matrix, and C is a m x n matrix, so
everything is fine.
In general, assert() has the following nice properties:
It checks the given condition at runtime, and will abort the code with a (somewhat) helpful message if the condition is violated.
It also serves as documentation for the reader of the code.
Finally, the checks can be easily removed by defining the macro
NDEBUG, removing any performance impact (but also giving up the benefit of runtime checking). cmake automatically definesNDEBUGfor all builds other thanDebugbuilds.
As an example, in our factorial() function, we should have at least added a
comment:
// This function needs to be called with n >= 0
int
factorial(int n)
{
...
However, we can replace this comment by actually enforcing the condition, too:
#include <assert.h>
// ...
int
factorial(int n)
{
assert(n >= 0);
// ...
}
This makes it clear to someone looking at the function that n has to be
non-negative, but also for someone who doesn’t bother looking that calls the
function with a negative number, it’ll catch the problem and terminate the
program with a meaningful message.
Core dumps (advanced)#
Core dumps save the state of the process to a file as it dies. This can be used to figure out what went wrong after the fact. This is also known as “post-mortem analysis”.
To enable them:
vscode ➜ /workspaces/class-8/build (2afb480) $ c/test_matrix_matrix_mul
Segmentation fault
vscode ➜ /workspaces/class-8/build (2afb480) $ ulimit -c unlimited
vscode ➜ /workspaces/class-8/build (2afb480) $ c/test_matrix_matrix_mul
Segmentation fault (core dumped)
Note that it now says “core dumped”. The core can be found as core or
core.<pid> in a system specific location, e.g. might be in /cores/ on Mac OS. Use
gdb to analyze it:
vscode ➜ /workspaces/class-8/build (2afb480) $ gdb c/test_matrix_matrix_mul core
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
[...]
[New LWP 24891]
Core was generated by `c/test_matrix_matrix_mul'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000aaaab7100d3c in ?? ()
(gdb) bt
#0 0x0000aaaab7100d3c in ?? ()
#1 0x0000aaaab7100818 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Well, that’s not helpful at all. If you get more lucky, and compiling the code in Debug mode definitely helps, you might get something like this:
[kai@macbook linear_algebra]$ gdb test_matrix_matrix_mul /cores/core.144
[...]
(gdb) bt
#0 matrix_matrix_multiply (C=<value temporarily unavailable, due to optimizations>, A=0x1001000a0, B=0x1001000b0) at matrix.c:96
96 MAT(C, i, j) += MAT(A, i, k) * MAT(B, k, j);
#1 0x000000010000179f in WTime [inlined] () at /Users/kai/src/iam851/linear_algebra/linear_algebra.h:68
#2 0x000000010000179f in main (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at test_matrix_matrix_multiply.c:69
The debugger: gdb (advanced)#
It’s even more convenient to use gdb directly, and it can do a lot more than
just analyze the stack trace after the code died. However, in many HPC
environments, in particular when using MPI, it’s difficult to interactively run
a debugger.
In general if one wants to use a debugger, it’s a good idea to compile the code
with -g -O0 -Wall flags, where the -O0 turns off optimization, which
otherwise tends to confuse both the debugger and in turn the guy who’s trying to
use the debugger.
In cmake, this can be done by running cmake -DCMAKE_BUILD_TYPE=Debug [...].
And, I talked about that before, when one is done debugging and wants best
performance, one should actually use cmake -DCMAKE_BUILD_TYPE=Release [...].
Bounds checking#
Time to go full circle. You’ve learned how to use the bounds checking for my vectors and matrices in the beginning of the class. Let’s use it to find where the bug happens. It’ll still require some more thinking to find and fix the bug, but it’s a good start.
Your turn#
Find and fix the bug!
More debugging#
floating point exceptions
Your turn / homework#
Follow the “Your turn” steps above. Other than actually fixing the bug, there isn’t a need to commit anything, but you should keep track of what you’re doing / what’s happening and put notes in the Feedback pull request.
Add
assert()statments to thematrix_matrix_mul()function that makes sure that the matrix dimensions match as required by the underlying math.Implement a
matrix_equals()function and use it to complete the matrix-matrix multiplication test.Add bounds checking to the C++ version of
vectorandmatrix, and verify that it works as intended.Add matrix-matrix multiplication to the C++ version of the library, and add a test for it.