Debugging Rust's new Custom Derive system

Brandon W Maister

2017-01-15 17:21

Why a custom derive macro

I've been working on a couple of crates to copy over the magic of AssertJ's field-by-field assertions to Rust. Partially as a way of learning Macros 1.1/custom derive, and partially because it's one of my favorite testing tools and nobody should have to live without it.

This post is about what I've been doing to debug my custom derive implementation, which can require a couple of extra steps to get to the same kind of good error messages that I'm used to from regular Rust code.

Debugging

There are a couple tips in the syn crate's README, which were the first thing that I read when I started working on field-by-field. Most of this post is just me exploring the implications and going into a bit/lots more detail.

test files

I'm writing a testing tool, so of course the first thing that I want to do is test that my code does what I want it to do. Plugins (custom-derive included) cannot be tested using the standard in-module #[test] annotation that most Rust code uses, but you can still use integration tests via the tests/ directory that Cargo knows about.

These modules are linked against the macro crate that you are writing, so everything works basically the same except that you end up using a #[macro_use] my_crate to make it work.

At that point the standard #[test] attribute does what you expect, and you can use the common cargo test and cargo test --test <integration-test-file> <optional-test-filter to do what you would expect:

$ cargo test
    Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs
     Running .../derive_enum_mixed-d370953e520402cf
...
test result: ok. 10 passed; 0 failed; 0 ignored; 0 measured

     Running .../derive_enum_struct-8851e9b38fffd49d
...
test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured

     Running .../derive_enum_tuple-4b98f23c906ffa8d
...
test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured

     Running .../derive_enum_unit-f5f0a72af200d8c4
...
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured

     Running .../derive_struct-116d0c78a0c6e62f
...
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured

Or to be more specific:

$ cargo test --test derive-struct assert_catches
    Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs
     Running .../derive_struct-116d0c78a0c6e62f

running 1 test
test assert_catches_differences ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

Wheeee.

Now we've got two kinds of problems

Syntactically invalid code

There are two -- maybe two and half -- phases to compiling these integration tests:

We have to compile the custom-derive library
We have to use the compiled lib to actually generate some code
That compiled code might generate code that is syntactically unsound
Or it could generate code that is syntactically fine, but violates some of rusts guarantees.

This means that there are three different kind of errors that we can get: Regular Rust errors Rust errors because some generated code is invalid * Rust errors because the generated code doesn't type-check.

Syntactic errors end up causing extremely long error lines:

error: expected one of `=>`, `if`, or `|`, found `}`
 --> <proc-macro source code>:1:367
  |
1 | impl :: field_by_field :: EqualFieldByField for SingleUnitEnum { fn fields_not_equal ( & self , other : & Self ) -> Vec < :: field_by_field :: UnequalField > { # [ allow ( unused_mut ) ] let mut list : Vec < :: field_by_field :: UnequalField > = Vec :: new ( ) ; match ( self , other ) { log_syntax ! ( ( & SingleUnitEnum :: One , & SingleUnitEnum :: One ) => { } ) } list } fn assert_equal_field_by_field ( & self , other : & SingleUnitEnum ) { let errs = self . fields_not_equal ( other ) ; if errs . len ( ) > 0 { let mut errmsg = String :: from ( "\n    Items are not equal:\n" ) ; for field_err in errs { errmsg . push_str ( & format ! ( "        {}: {:?} != {:?}\n" , field_err . field_name , field_err . actually , field_err . expected ) ) ; } let ac_exp = format ! ( "    actually: {:?}\n    \
  |                                                                                                                                                                                                                                                                                                                                                                               ^

If you scroll all the way to the right you'll see where I omitted a match arm entirely, causing Rust to just barf at me.

Perhaps the most interesting part of this is that these errors come from within my procedural macro, but for some reason they result in error messages that include the generated source, making them among the easiest proc-macro code to debug.

Syntactically valid errors

An Error

Tests exist to make it possible to refactor and know that your code still works, and if you do TDD to give you a nice incremental target to work towards. They're great at helping write features, and the Rust compiler is great at helping prevent stupid mistakes from creeping into prod.

But! When writing custom derive macros, these errors can not only prevent your code from compiling, they can prevent your code from even existing. Let's say I forget to unquote an ident somewhere. For example, I've got a line that looks like:

    if is_multivariant {
        // long and simple code
    } else {
        quote! {
            ( &#name::#var_name, &#name::#var_name ) => {}
        }
    }

What happens if I forget to include one of the #s? What will that look like to me as I'm writing it, and if I screw something up in a release, what will it look like to my downstream users? What can we do to figure out the fat-fingered mistake I've made?

Let's take a look:

The unexpanded error message (from above, where I forgot to unquote a token) will be:

$ cargo test --test derive-enum-unit
   Compiling field-by-field-macros v0.1.0 (file:///Users/bwm/projects/field-by-field/field-by-field-macros)
error: no associated item named `var_name` found for type `UnitEnum` in the current scope
 --> tests/derive-enum-unit.rs:9:10
  |
9 | #[derive(FieldByField, Debug)]
  |          ^^^^^^^^^^^^

Hm. Okay I know I should look for var_name in my code, somewhere. The rest of this post is about getting better error messages.

cargo expand

The cargo-expand Cargo subcommand (by the author of the syn and quote crates that are the bread and butter of custom-derive implementations) is supremely helpful for figuring out what is actually going on with your code.

Of particular note for the way that I've been hacking, cargo-expand works with integration test files. I've found myself in an "edit - expand - compile - test" loop to figure out what the heck I was trying to do. What that looks like in more detail for me is something like:

$ emacs src/lib.rs
$ # expanded code relies on some internal rust features:
$ echo '#![feature(box_syntax, test, fmt_internals)]' > tests/derive-enum-unit-expanded.rs
$ cargo expand --test derive-enum-struct > tests/derive-enum-struct-expanded.rs
$ cargo test --test derive-enum-struct-expanded

This results in wildly different error messages.

$ cargo test --test derive-enum-unit-expanded
   Compiling field-by-field-macros v0.1.0 (file:///Users/bwm/projects/field-by-field/field-by-field-macros)
... 30 lines of errors about the format! macro
error: no associated item named `var_name` found for type `UnitEnum` in the current scope
  --> tests/derive-enum-unit-expanded.rs:41:15
   |
41 |             (&UnitEnum::var_name, &UnitEnum::One) => {}
   |               ^^^^^^^^^^^^^^^^^^

error: no associated item named `var_name` found for type `UnitEnum` in the current scope
  --> tests/derive-enum-unit-expanded.rs:72:15
   |
72 |             (&UnitEnum::var_name, &UnitEnum::Two) => {}
   |               ^^^^^^^^^^^^^^^^^^

... 30 lines of errors about the format! macro

error: aborting due to 6 previous errors

error: Could not compile `field-by-field-macros`.

In addition, this error will be in context, so it's more straightforward to see exactly where in my generating code it is probably coming from. In particular, for my code, this turns into:

impl ::field_by_field::EqualFieldByField for UnitEnum {
    fn fields_not_equal(&self, other: &Self) -> Vec<::field_by_field::UnequalField> {

        let mut list: Vec<::field_by_field::UnequalField> = Vec::new();
        match (self, other) {
            (&UnitEnum::var_name, &UnitEnum::One) => {}

Which makes it much more obvious to me where I should be looking.

It also means that I can edit the code in place and often get rid of that specific error (although including format!/println!/etc anywhere in your macro output seems to prevent it from ever succeeding building) which is also a much faster edit loop, since emacs + flycheck will show me the error while I'm editing, and will show me that I've fixed it when I have.

println!

cargo-expand is the big guns, the first thing I reach for when I don't even know what part of my code is causing the error. If I know approximately what I'm doing, though, it's possible to get less verbose output by just using println!. Tokens objects implement Display and Debug, so if you've combined several quote! macros, or you aren't sure what the values of variables are, this is a pretty good option. (Probably you could use a debugger here, but that idea hadn't even occured to me until right now.)

Printed output just happens at compile time, so the same cargo test --test <file> code works to narrow to exactly the use case you want to expand:

So if I've got the same error as above, I'll modify my code like:

let r = quote! {
    ( &#name::var_name, &#name::#var_name ) => {}
};
println!("in single variant: {}", r);
r

$ cargo test --test derive-enum-unit
   Compiling field-by-field-macros v0.1.0 (file:///Users/bwm/projects/field-by-field/field-by-field-macros)
in single variant: ( & SingleUnitEnum :: var_name , & SingleUnitEnum :: One ) => { }
error: no associated item named `var_name` found for type `SingleUnitEnum` in the current scope
  --> tests/derive-enum-unit.rs:15:10
   |
15 | #[derive(FieldByField, Debug)]
   |          ^^^^^^^^^^^^

error: aborting due to previous error

error: Could not compile `field-by-field-macros`.

Which, when you look at what's actually there, after in single variant:

( & SingleUnitEnum :: var_name , & SingleUnitEnum :: One ) => { }

That looks just like a normal match arm, but that var_name is suspiciously similar to the variable in my code, not in the code I'm decorating.

Aside: one of the things that caused me some headaches in my original implementation of this was that I was using similar variable names in my tests and in my code, and so these errors weren't occuring because I had the correct literal text in my quasi quoted Rust. When I figured out what was going on it was hilarious, believe me.

Takeaways

This is actually pretty nice! Using quote! as a template language is pretty convenient, even though the multiple phases of compilation mean that debugging is a little rougher than writing "regular" code. I hope that we will eventually get span information for procedurally-generated code, making the debug cycle faster.

It would be nice if there was a way to pass rust the --pretty expanded option but exclude some specific macros -- probably anything from inside std would be the right thing, since those things seem to rely on dark magic that I can't get to compile.

I would like to investigate actually using a debugger to hook into the code-generation phase. I assume that's well-trod ground, and that I'm just not familiar with it.

I'm very excited for the things that will be ~~possible~~ accessible now that custom derive is stable, and if procedural macros end up as nice as custom derive is then I have a combination of hope and trepidation surrounding the kinds of magic that Rust will provide.