a bit of respite by michaelwoerister

#Weekly Status Report #2

tl;dr ― Memory layout debug info finally done right :)

Another week past, another report due. Last week I already started thinking about how to handle data structure memory layout in a less fragile way than the compiler does to date. Some of these thoughts can also be read about in an email I posted on the mailing list on Monday. The short version is this:

Use the definite source of the information you need. Do not try to reconstruct the information using an algorithm of your own which you think―probably, hopefully―behaves the same way the original algorithm constructing the original information does. If the results you are looking for are lying in front of you anyway, just look them up!

Put like this, it sounds pretty obvious. Nonetheless, until now the debuginfo module in rustc did exactly this: It tried to emulate LLVM’s memory layout rules, which can be quite complicated. The consequence was that only the most common case was properly supported: data with standard padding and alignment. This is fine mostly but LLVM supports different data layouts and packed structs which by definition contain no padding bytes. Also, rustc sometimes adds fields to data structures to support the runtime system, like the enum discriminant or the ‘destroyed’ flag of structs implementing the Drop trait. These fields have to be taken into account too when calculating datatype sizes and field offsets. All of this results in making memory layout computation a rather complicated affair.

On the other hand, the LLVM types generated by previous compiler passes must already know about all of this stuff, since they are what the actual machine code is generated from. For this reason, I refactored the way memory layouts for composite types are determined for debug symbol generation. Type sizes and field offsets within structs, tuples, box headers, vec headers, vec slices, etc are now all queried from their LLVM type. Not only will this work correctly for corner cases too now, it also has the benefit of staying correct in the future, should something change somewhere else.

However, this method too still needs to know which field to expect at which index within composite type. There are two cases where this can be complicated:

For structs, tuples and enums defined by the user we have to know when and where the compiler generates additional fields, like the ‘destroyed’ mentioned above.
For completely internal structures, like box headers, we have to get the right layout from somewhere.

In both cases it is a good idea to assert! any assumptions the code is written under. I tried doing this by letting the compiler check the type structure of given internal structures using functions like the following:

fn box_layout_is_as_expected(cx: &CrateContext,
                                 member_llvm_types: &[Type],
                                 content_llvm_type: Type)
                              -> bool {
        member_llvm_types.len() == 5 &&
        member_llvm_types[0] == cx.int_type &&
        member_llvm_types[1] == cx.tydesc_type.ptr_to() &&
        member_llvm_types[2] == Type::i8().ptr_to() &&
        member_llvm_types[3] == Type::i8().ptr_to() &&
        member_llvm_types[4] == content_llvm_type
    }

Another way I tried to make the debug info code as stable as possible, is to keep it DRY by using common functionality that other parts of the compiler rely on too, like the Type type (the type Type?) from middle::trans::type_. The methods of Type can provide the type structure of some internal data types (others can be found in the implementation of the typeof::typeof() like vector slices).

And the third safety net are of course more automated tests, with which I try to complement each new feature immediately.

With all of this in place, I hope we can have some confidence in the data the debugger will put on our screens in the (hopefully near) future