-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39191: [R] throw error when string_replace
is passed vector of values in pattern
#39219
GH-39191: [R] throw error when string_replace
is passed vector of values in pattern
#39219
Conversation
|
@thisisnic I have started this PR with my failed attempt to add the check and tests I thought might be useful. I could not figure it all out and my tests fail. |
Awesome, thanks for submitting this PR @abfleishman! What's happening here is that we have some code which pulls the data in R if there is an error trying to run it in Arrow - in a lot of cases, this might be because Arrow doesn't support something that the original R function can, and so we chose to make sure the code can still run if possible. There are other functions which trigger errors which we have tested the error and warning messages for, which can be found here: arrow/r/tests/testthat/test-dplyr-funcs-string.R Lines 645 to 676 in a4fae02
So, if you update your test to be like those ones, you should be able to test for the error message you have created. A question for you also, to fix the reported bug, do we need to check the input length for both |
…`call_binding` in test; add `replacement` to the error message
@thisisnic can you explain what the also, I think this is ready? LMK if there is more I should do? |
Sure. Here's the code for Lines 136 to 138 in 75c6b64
In the arrow R package, we create "bindings" which create a link between the call to the R code and an expression which can be understood and ran by Arrow - in your case in this PR, the binding to Here's an example of using
On its own |
My apologies, I had a brain fart moment when I asked if we need to test them both, and was getting the input data column mixed up with This PR is looking excellent, and is most of the way there; just needs the extra check (for |
…le_file` on edited files
expect_error( | ||
arrow_table(df) %>% | ||
transmute(x = call_binding("str_replace_all", x, c("F" = "_", "b" = ""))) %>% | ||
collect(), | ||
regexp = "`pattern` must be a length 1 character vector", | ||
) | ||
expect_error( | ||
arrow_table(df) %>% | ||
transmute(x = call_binding("str_replace_all", x, c("F", "b"), c("_", ""))) %>% | ||
collect(), | ||
regexp = "`pattern` must be a length 1 character vector", | ||
) | ||
expect_error( | ||
arrow_table(df) %>% | ||
transmute(x = call_binding("str_replace_all", x, c("F"), c("_", ""))) %>% | ||
collect(), | ||
regexp = "`replacement` must be a length 1 character vector", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny last change to suggest, and then this will be ready to merge! You don't need the arrow_table()
etc stuff here as call_binding()
can be called on its own. What is needed instead is to create a field reference so call_binding()
has something to refer to in the expression it creates, and then expect_error()
can just wrap call_binding()
.
For example, in that final test, you'll need something a bit shorter, like this:
x <- Expression$field_ref("x")
expect_error(
call_binding("str_replace_all", x, c("F"), c("_", ""))),
regexp = "`replacement` must be a length 1 character vector"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I'll be honest. I really am lost in this whole call_bindings
/Expression$field_ref("x")
stuff. I guess thats what I get for being a field biologist dabbling in computer science. Thanks for walking me through this stuff!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've dived into a really tricky bit of the codebase, but you've done a great job! A lot of my own early PRs to arrow involved drawing analogies between different bits of the codebase, but not understanding exactly what was going on (and this is still the case for any of my PRs which involve any C++).
Congratulations on your first PR to Arrow! Once the CI passes I'll merge it. Welcome to the project :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful, thank you for making this PR!
Nice! Thanks @thisisnic! Slightly confusing but great (small) issue to tackle as a first issue! Thanks for walking me through making the edits. I imagine you spent more time responding to me than it would have taken you to make the changes yourself, but I guess that is part of the open-source philosophy! |
Happy to do it again any time :) |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 64fed4e. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…r of values in `pattern` (apache#39219) ### Rationale for this change See apache#39191 This PR will hopefully throw an informative error message to let the user know that while the stringr::str_replace_all function can handle a named vector of values as the pattern argument, the arrow R package implementation cannot. ### What changes are included in this PR? - [ ] add tests for passing vector to the pattern argument - [ ] add check for length > 1 to the string replace bindings ### Are these changes tested? yes (though I need help!) ### Are there any user-facing changes? yes. Hopefully the user will be alerted by an informative error message that they cannot pass a vector to the pattern argument. No breaking changes are expected. * Closes: apache#39191 Authored-by: Abram B. Fleishman <[email protected]> Signed-off-by: Nic Crane <[email protected]>
…r of values in `pattern` (apache#39219) ### Rationale for this change See apache#39191 This PR will hopefully throw an informative error message to let the user know that while the stringr::str_replace_all function can handle a named vector of values as the pattern argument, the arrow R package implementation cannot. ### What changes are included in this PR? - [ ] add tests for passing vector to the pattern argument - [ ] add check for length > 1 to the string replace bindings ### Are these changes tested? yes (though I need help!) ### Are there any user-facing changes? yes. Hopefully the user will be alerted by an informative error message that they cannot pass a vector to the pattern argument. No breaking changes are expected. * Closes: apache#39191 Authored-by: Abram B. Fleishman <[email protected]> Signed-off-by: Nic Crane <[email protected]>
hey guys! i'd like to suggest including a caveat on this page to highlight that pattern/replacement vectors are not supported. also, user feedback: i think the error message could be a little more informative, by explicitly saying that arrow cannot support multiple pattern/replacement vectors, it would be clearer to me since i thought it was a message from R and not from arrow |
Hi @baarthur - as this PR has now been marged, I don't suppose you'd mind opening a new issue with this? It'd be great to include your suggestion for the updated error message. If you were interested, we'd be happy to accept a PR for those changes too :) |
Rationale for this change
See #39191 This PR will hopefully throw an informative error message to let the user know that while the stringr::str_replace_all function can handle a named vector of values as the pattern argument, the arrow R package implementation cannot.
What changes are included in this PR?
Are these changes tested?
yes (though I need help!)
Are there any user-facing changes?
yes. Hopefully the user will be alerted by an informative error message that they cannot pass a vector to the pattern argument. No breaking changes are expected.