LLVM MCA / IACA integration? #40

hdevalence · 2018-08-15T19:57:00Z

First, cargo asm is a really wonderful addition to the Rust tooling ecosystem -- thanks so much for making it.

I'm wondering whether it would make sense to integrate it with the LLVM MCA tool, or Intel's IACA. These do static analysis on machine code, to estimate throughput and port pressures on a specific microarchitecture.

I made a proof-of-concept for using IACA with Rust code. It doesn't really work super well (and IACA is proprietary anyways), but it is pretty neat:

It would be really cool if there was an easy way to use these tools on fragments of Rust code (I mean, so that it's as easy to see throughput estimates as cargo asm makes it easy to see the generated asm).

I'm not sure exactly how it would work -- AFAIK they're intended to be used for loop bodies (to estimate throughput), but cargo asm is oriented around specific functions. Maybe it would work to have a way to apply the MCA analysis to a specific function, as if that function was the body of a loop? I'm also not sure if it makes sense for that functionality to be part of cargo asm, vs a cargo mca tool or something, but it seems like a bunch of the code required to pull out a specific function would be common.

The text was updated successfully, but these errors were encountered:

gnzlbg · 2018-08-15T21:13:53Z

Thanks, this really looks like something worth doing.

So cargo llvm-ir and cargo asm are actually the same exact identical binary... that is, they share 100% of the code 🤣 So having cargo mca doesn't mean that the functionality couldn't live here.

Having said this, cargo asm provides sort of an AST for assembly (this is very ad-hoc because it supports asm for many architectures, and they are all different). Currently, it just outputs a function, but we could make it output something else, like a loop body. It just really needs to know where it starts and where it ends.

Maybe we could emit this "delimiters" from rustc somehow ? For example, using inline assembly before and after the loop we might be able to put some labels or directives, that we can use to identify the loop. Also, rustc recently gained the ability to comment assembly code, so maybe we could make it spit comments before and after the loop as well, which shouldn't inhibit as many optimizations as inline assembly would.

Once those are in the generated assembly, cargo mca / cargo iaca could find those, extract the assembly, and forward it to the tool in whatever format it accepts.

The alternative would be for cargo asm to somehow "identify" the loop. Some loops jump around many labels, and some loops are tighter, so I don't think this is easily doable, but maybe the tool could become "interactive", where the user inputs e.g. the line numbers of where the loop starts and ends.

No idea, how does your tool work?

hdevalence · 2018-08-20T22:40:20Z

So cargo llvm-ir and cargo asm are actually the same exact identical binary... that is, they share 100% of the code 🤣 So having cargo mca doesn't mean that the functionality couldn't live here.

Yup, sorry I was unclear, that's what I was trying to get at with "the code would be common"... I guess the choice of how to call it is purely a UX concern.

No idea, how does your tool work?

It works ... kind of badly 🙃

It has macros that insert inline asm, which sticks byte markers into the generated machine code, which the IACA tool then disassembles to select the loop body. But it turns out (I think, it's been a year or so) that the inline asm markers will sometimes slide around a bit, so you have to look at the generated asm anyways to check that the optimizer didn't move them.

The usability is pretty bad: you have to add the crate as a dep, stick in the markers manually, compile and emit asm, check the asm, then find the generated binary in target/, then pass that to IACA. So it would be great to have something that was as ergonomic as cargo asm is.

As is, cargo asm is pretty ergonomic. I'm not sure exactly what a loop-selecting UX would look like. Picking out line numbers might work. I'm not totally sure how valid the analysis is for complex loops of the kind that you mention.

Maybe to start, it would be sufficient to use the same function-selection logic as cargo asm uses, and allow estimating throughput for calling the given function over and over on some inputs. I guess this discards all of the functionality about loop dependencies, and I'm not sure if the analysis is really supposed to be used that way.

PSeitz · 2023-02-12T04:44:37Z

llvm-mca integration is now supported by cargo-show-asm pacak/cargo-show-asm#126

gnzlbg added the P-Mid label Aug 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM MCA / IACA integration? #40

LLVM MCA / IACA integration? #40

hdevalence commented Aug 15, 2018

gnzlbg commented Aug 15, 2018 •

edited

Loading

hdevalence commented Aug 20, 2018

PSeitz commented Feb 12, 2023

LLVM MCA / IACA integration? #40

LLVM MCA / IACA integration? #40

Comments

hdevalence commented Aug 15, 2018

gnzlbg commented Aug 15, 2018 • edited Loading

hdevalence commented Aug 20, 2018

PSeitz commented Feb 12, 2023

gnzlbg commented Aug 15, 2018 •

edited

Loading