Title: On reducing the bus factor, school rant edition
Date: 2019-10-28 11:30

[anarcat](https://anarc.at/) recently published a blog post entitled
[Theory: average bus factor = 1](https://anarc.at/blog/2019-10-16-bus-factor/),
speculating that the average bus factor for open source software is on average,
one. The average might even be lower, with unmaintained zombies projects still
being both packaged and used everywhere.

Dramatically Low bus factors is a well known issue regularly surfacing
every time a new major vulnerability in an open-source software is made public:
[Heartbleed](https://en.wikipedia.org/wiki/Heartbleed) highlighted that OpenSSL
only had [two developers to take care of ½ million lines of
code](https://web.archive.org/web/20180426011940/https://blog.ssh.com/free-can-make-you-bleed),
when people had [fun with SKS
servers](https://code.firstlook.media/the-death-of-sks-pgp-keyservers-and-how-first-look-media-is-handling-it)
we discovered that the main implementation was a steaming heap of unmaintained
and unmaintainable OCaml, GnuPG [almost
died](https://arstechnica.com/information-technology/2015/02/once-starving-gnupg-crypto-project-gets-a-windfall-but-can-it-be-saved/)
in 2015 because Werner Koch was underfunded and [wanted to make a decent
salary](https://www.propublica.org/article/the-worlds-email-encryption-software-relies-on-one-guy-who-is-going-broke),
[CopperheadOS](https://copperhead.co/android/) died when its [lead and only
developer](https://twitter.com/DanielMicay) left, … the list goes on and on.

While this is scary and shit, I haven't seen a lot of material about what can
be done to fix it, beside the obvious, like trying to make your project a [nice
and welcoming
place](http://sage.thesharps.us/2015/10/06/what-makes-a-good-community/) so
that maybe, a healthy community could form around it.

But I'm convinced that we could do more, as I'm going to suggest based on
personal *anecadata* about my scholarship.

## Don't force people to reinvent the wheel

I distinctively remember the C course semester project that I was given
almost 10 years ago at school: implement a `libmatrix`, to
multiply/sum/subtract/transpose/invert/trace/decompose/diagonalize/… matrices,
and compute determinant/eigenvectors/eigenvalues/… which boiled down to

1. Reimplementing (as in copy-paste code from) the
  [GNU Scientific Library](https://www.gnu.org/software/gsl/)/
	[ALGLIB](http://www.alglib.net/)/
	[Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page/)/…
2. Get a good grade
3. Throw the code away

It would have been great to have to fix some bugs in those libraries instead of
reinventing the wheel that I didn't care about.

I'm not saying that people shouldn't write their own version of existing
projects: it's great and empowering to be do so, but if you're going to push
your students into writing some code, please direct them towards existing
projects. Heck, you could even send an email to some maintainers/lead dev of
projects that you like, to them if they would be ok mentoring and helping your
students, to help get some bug fixed or cool new features added: you'll likely
get enthusiastic replies.


## Teach how to contribute

During my bachelor's degree, there was a mandatory code project, to get done in
teams of a dozen student each, over a whole semester, about writing an
inventorying system for car parts in PHP, so that a small manager at the local
car manufacturer, who was *kind enough* to perform a sketchy interview about
the requirements, could pick the least worse one for free.

Anyway, when I asked my fellow schoolmates what tools we should use for this
assessment, the consensus was to use Dropbox and USB keys as a revision
control, a word document as bug tracker, and a single Apache2 deployment where
everybody could copy their code and check if it was working.

This was completely insane, and I was genuinely angry both at my schoolmates,
and at the teachers who thought that teaching things like
[MERISE](https://en.wikipedia.org/wiki/Merise) and Scrum/Agile/… was a better
investment of everybody's time than explaining to his students how to use
version control software, bug trackers, how to properly communicate on a
mailing list, use static analysis tools, take advantage of continuous
integration, perform efficient and useful code reviews, …

I have the strong feeling that instead of wasting 4h a week working on such a
terrible project, we could have learned so much more working together to
implement whatever cool feature in a large open-source project. 

Moreover, no matter how great your code is, it won't get merged, or even
reviewed at all if you don't know how to send patches, communicate on a mailing
list, and handle reviews and nits.

Interestingly, in 2004, [Daniel J.
Bernstein](https://en.wikipedia.org/wiki/Daniel_J._Bernstein) ran a university
course called, [MCS 494, UNIX Security Holes]( http://cr.yp.to/2004-494.html ),
in which the students were required to find real world security vulnerabilities
in open source software and report them to the corresponding maintainers. 

In 2014, [John Regehr](http://john.regehr.org/) at the University of Utah ran a
course called [Writing Solid Code]( https://blog.regehr.org/archives/1080 ),
with a focus on practical code with modern development techniques: git,
coverage, tests, code reviews, …

## Reward contributions

During my master's degree, I was able to take a "coding project course": I had
to define a worthy project that I wanted to achieve during the semester, and
find a teacher who was willing to supervise me. So I spent 3 months improving
the search capabilities of [radare2](http://rada.re): string constants,
patterns, ROP gadgets, … and wrapper it up with an overly enthusiastic presentation,
about building architecture-agnostic rop-chains in efficient ways, in front of
medusé teachers! I got the best possible grade, and was so proud of being able
to contribute to a real-world software! I even convinced the head of the
department to postpone my exams so that I could go host a radare2 workshop at
the [hacklu]({filename}/talks/hack.lu2014.md)!

As a teacher, you should find ways to reward the nights some students are
spending writing code for open-source projects: it's nothing more than a
practical course after all.


## If everything fails, at least put a label

Github is doing [interesting
things](https://help.github.com/en/github/managing-security-vulnerabilities/about-security-alerts-for-vulnerable-dependencies)
to warn developers about security issues, both in their code, but in their
dependencies as well. I think that the next logical step, not only for github
but also for [MVN
Repository](https://mvnrepository.com/), [PyPI](https://pypi.org/),
[npmjs](https://www.npmjs.com/), … is to automatically label dead
projects, and warn dependencies accordingly, for example:

> WARNING: your project has a
critical dependency on FooLib.  This project has had only 2 contributors in the
past 17 years and has had no commits in the last 16 years.  You might want to
consider the risk to your project of depending on this code…

## Conclusion

If you know some CS students and teachers, tell them about the [Google Summer
of Code]( https://summerofcode.withgoogle.com/ ), [Outreachy](
https://www.outreachy.org/) and [similar initiatives](
https://github.com/tapaswenipathak/Open-Source-Programs), who are paying
students to work on amazing open-source projects during their holidays.
Tell them about those crazy people writing code on their free time for everyone
to use, and how happy they would be to mentor and help students, for free.

Some of the ideas in this blogpost stemmed from an friendly email from
Chad Dougherty, many thanks to him!
