Title: Playing with Weggli
Date: 2021-10-14 20:00

[Felix Wilhelm](https://twitter.com/_fel1x) from
[Google's Project Zero](https://googleprojectzero.blogspot.com/p/about-project-zero.html)
recently released [weggli](https://github.com/googleprojectzero/weggli/):

>  weggli is a fast and robust semantic search tool for C and C++ codebases. It
is designed to help security researchers identify interesting functionality in
large codebases. 

[Oblivion](https://pwning.systems/about/), avid [CodeQL
user](https://pwning.systems/posts/sequoia-variant-analysis/) was of course
interested, so we spent an evening on [irc](https://www.darkscience.net/)
drinking beer and trying to come up with interesting queries to run, mostly
against the [Linux kernel](https://www.kernel.org/category/about.html).

## Queries

To find `kmalloc` multiplication overflows:

```
$ weggli --unique -R 'a!=^[A-Z_]+$' 'kmalloc($a * _);' ~/linux
```

Since [this
commit](https://github.com/googleprojectzero/weggli/commit/289b47451f708b5cf36e0cdbf768521e426dcaa8),
binary expressions are commutative, meaning that the query will match if at
least one variable isn't in capital.

In this one, the idea is to find overflows happening only in the allocation,
but not in the usage:

```
$ weggli --unique 'kmalloc($a + _); memcpy(_, _, $a);' ~/linux
```

A classic mistake in C is to use `sizeof(ptr)` instead of `sizeof(type of the
pointed thing)`:

```
$ weggli -R 'func=^mem' --unique '$a * _; $func(_ , _, sizeof($a));' ~/linux
```

Unfortunately, there is currently no way for now to tell weggli that the first
argument of `$func` shouldn't be `&a`; but it's possible to use something like
`-R 'b!=&'`, but it sucks.

Copy functions like `memcpy` and its friends should always copy up to the size
of the target, not the source. Unfortunately, it's not uncommon to see the
latter, via this query:

```
$ weggli --unique -R 'func=co?py' -R 'size=sizeof|strlen' '$func($dest, $src, $size($src));' ~/linux
```

Variants to match on structures are also producing interesting results:

```
$ weggli --unique -R 'func=co?py' '$func($dest, $src, $src->$len);' ~/linux
$ weggli --unique -R 'func=co?py' '$func($dest, $src->$buf, $src->$len);' ~/linux
```

We tried various approaches to find trivial double-frees, like:

```
$ weggli --unique '{
	kfree($a);
	NOT: goto _;
	NOT: break;
	NOT: continue;
	NOT: return;
	NOT: $a = _;
	kfree($a);
}' ~/linux
```

but didn't manage to make anything elegant, since there is no way to formulate
that we don't want any `break`, `goto _`, … between the two frees, or at least
that the two are [reachable](https://github.com/googleprojectzero/weggli/issues/10).

Variable length arrays are risky and prone to errors; if the length is more
than the stack size, a stack overrun will occur, and the possibilities of error checking are… suboptimal.
So here's how to find them:

```
$ weggli --unique '_ $func(_ $len) {
NOT: _ = $buf[$len];
NOT: $buf[$len] = _;
_ $buf[$len];
}' ~/linux
```

Stupid things like free'ing stack-allocated variables:

```
$ weggli --unique '$a = alloca(_); free($a);' ~/target
```

Shady-looking side-effects:

```
$ weggli --unique -R '$op=\+\+|--' 'if ( _ && _ $op)' ~/linux
```

Unspecified parameter order evaluation with side-effects in the mix:

```
$ weggli --unique '$f($a++, $b++)' ~/linux
$ weggli --unique '$f(++$a, ++$b)' ~/linux
$ weggli --unique '$f($a--, $b--)' ~/linux
$ weggli --unique '$f(--$a, --$b)' ~/linux
```

Division by zero:

```
$ weggli --unique '$a = 0; _ / $a' ~/linux
```

Same condition:

```
$ weggli --unique 'if ($a); else if ($a);' ~/linux
```

Sizeof void:

```
$ weggli --unique 'void * $a; sizeof(*$a)' ~/linux
```

It is possible that not all data has been initialized or that kernel pointers
are present:

```
$ weggli --unique '{
	NOT: $a = memdup_user(_);
	NOT: memset($a);
	NOT: memset($a->$b);
	copy_to_user(_, $a, sizeof(*$a));
}' ~/linux
```

To find KASLR bypasses like [this one]( https://github.com/torvalds/linux/commit/d0d62baa7f505bd4c59cd169692ff07ec49dde37):

```
$ weggli -R 'a=addr' 'dev_info($a);' ~/dev/linux
```

Not accounting for the terminal `0` when allocating a string via `snprintf`:

```
$ weggli --unique '$a = snprintf(0, 0, _); malloc($a);' ~/target
```

Not reading `snprintf`'s manpage:

```
weggli --unique '$pos = snprintf(_ + $pos);' ~/target
```

Since weggli supports C++, here is a dumb one to find type-confusion frees:

```
$ weggli --cpp --unique '$a = new _; $b = (_) $a; delete $b;' ~/target_cpp
```

## Limitations

Overflow in format string, since there is no way to express constrains [between
variables]( https://github.com/googleprojectzero/weggli/issues/9 ) or to
manipulate string literals.

```
$ weggli --unique --contrain '$a>$b' '$buf[$b]; scanf("%$as", $buf);' ~/target
```

Trivial double-free detection, since there is no way to express that statements
must be [reachable]( https://github.com/googleprojectzero/weggli/issues/10 ):

```
$ weggli --unique --followup 'free($a); free($a);' ~/target
```

String literal again, and wildcard for the [number of arguments]( https://github.com/googleprojectzero/weggli/issues/13 ):

```
$ weggli --unique -R 'a=addr' -R 'b=0x%' 'dev_info(_, $b, ..., $a);' ~/target
```

## Conclusion

We found a couple of bugs, but since the goal was to play around, we didn't
spend time triaging nor reporting them. Weggli is pretty cool, kind of
in-between `grep` and `CodeQL`. It still comes with some shortcomings: some by
design like the absence of interprocedural semantics and control-flow notions,
others because it's still a young project, but Felix is (still?) enthusiastic
about adding missing features!
