Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulley: Implement iadd_pairwise #9912

Merged
merged 4 commits into from
Dec 30, 2024

Conversation

eagr
Copy link
Contributor

@eagr eagr commented Dec 29, 2024

part of #9783

@eagr eagr requested review from a team as code owners December 29, 2024 16:30
@eagr eagr requested review from fitzgen and removed request for a team December 29, 2024 16:30
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator pulley Issues related to the Pulley interpreter labels Dec 29, 2024
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "cranelift", "pulley"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: pulley

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

let [h, t]: [_; 2] = pair.try_into().unwrap();
h.wrapping_add(t)
})
.collect::<Vec<_>>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these instructions we'll definitely want to avoid an intermediate allocation of a Vec. It might be reasonable to just use manual indices here and/or "macro expanding" some code. I don't think the standard library helpers for all this are necessarily the best way to go.

Copy link
Contributor

@Xuanwo Xuanwo Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have limited numbers of items to operate, maybe we can just write them out like:

fn vaddpairwisei16x8_s(&mut self, operands: BinaryOperands<VReg>) -> ControlFlow<Done> {
    let a = self.state[operands.src1].get_i16x8();
    let b = self.state[operands.src2].get_i16x8();
    
    let mut result = [0i16; 8];
    
    // Process first 4 elements from a
    result[0] = a[0].wrapping_add(a[1]);
    result[1] = a[2].wrapping_add(a[3]);
    result[2] = a[4].wrapping_add(a[5]);
    result[3] = a[6].wrapping_add(a[7]);
    
    // Process first 4 elements from b 
    result[4] = b[0].wrapping_add(b[1]);
    result[5] = b[2].wrapping_add(b[3]);
    result[6] = b[4].wrapping_add(b[5]);
    result[7] = b[6].wrapping_add(b[7]);

    self.state[operands.dst].set_i16x8(result);
    ControlFlow::Continue(())
}

This also has a compiler to perform better analysis, I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I need a mind shift for this :)

@alexcrichton alexcrichton added this pull request to the merge queue Dec 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Dec 30, 2024
@alexcrichton alexcrichton added this pull request to the merge queue Dec 30, 2024
Merged via the queue into bytecodealliance:main with commit d78544e Dec 30, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator pulley Issues related to the Pulley interpreter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants