-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-4026: [c++] Add new methods to CustomAttributes to allow non-string custom attributes in schemas #3266
base: main
Are you sure you want to change the base?
Conversation
…deprecated versions (except one case, testing the deprecated ones); add test cases specifically for deprecated methods
I'm not fully familiar with the context (and the code) yet. I agree that keeping backward compatibility is the bottom line. However, from the JIRA issue it seems that non-string attributes were supported once but broken after fixing other issues. Shouldn't we just fix it to revert the unexpected behavior? Except the alternatives list above, is it possible to add an option to |
Most certainly. I hadn't considered it, but it would be easy to change to that. Would like you me to apply that suggestion in this PR? For back-compat, the "default" mode would have to be the awkward (arguably broken) string-only mode, that just wraps the input in double-quotes (with no other escaping). But a new constructor could be added to enable a mode where the API expects well-formed JSON-encoded data (so string values must be quoted and escaped). |
@wgtmac, I've updated this branch so that instead of deprecating the old methods and adding new ones, there's now a constructor flag to indicate whether values are strings or arbitrary JSON values (in which case string values must be quoted). In updating tests, I realized that my code wasn't correctly validating that string values were correct -- if it had a botched/unescaped quote, it wouldn't be flagged as an invalid value. So I added some stuff in the json folder so that |
@wgtmac, if you don't like the look of this API -- using a separate constructor with a bool flag -- let me know. I'm happy to roll back that commit or continue to iterate on this. |
I am not C++ dev/user, so my opinion is even more irrelevant :-)
|
So who is the right person that we should tag, to review and approve this?
Sure, I can do that. |
No one! The Avro team is notified anyway. If no one from the Avro team merges the PR for some reasonable time then I could help with the merge only after at least two approvals from the Avro community. |
I can help review it :) |
@wgtmac, what do you think of the current approach? If I just replace the bool constructor parameter with an enum (as mentioned above), would that be pretty close to an acceptable patch? Any other concerns or feedback? |
@jhump Yes, I think so. |
@wgtmac, I've updated the PR with that suggestion. Please take a look. Thanks in advance! |
This adds new methods to
CustomAttributes
to allow setting non-string values. These other methods work with JSON-encoded strings.Unlike #3064 and #3069, this change attempts to be backwards compatible. However, from reading more comments in pull requests, it looks like the "fix" I added (to escape the keys and values in custom attributes when printing to JSON) may actually be a compatibility issue since it seems that users were expected to have to escape string values if they contained any characters that would be escaped in JSON (including quotes). That seems like a really terrible API, and it also meant that the values would not round-trip correctly: reading a data file would not create custom attributes with these strings properly escaped, so later writing out data with the same schema would generate an invalid schema JSON document.
In any event, this uses strings as the values even though it would be ideal if we could pass some sort of structured data as the value type. The ideal types (
json::Entity
and its accompanyingjson::Object
andjson::Array
types) are defined injson/JsonDom.hh
. But that header files is not part of the Avro include files distribution, which means we cannot#include
it fromCustomAttributes.hh
, so it's a no-go. From a little history spelunking, I see that they indeed used to use a structured form which was simplified to strings in #1821, purely because these JSON header files aren't available to users in the Avro distribution.Alternatives that I considered for using JSON-encoded strings:
lang/c++/impl
and intolang/c++/include/avro
. But then we expand the public API of too much. This approach was already tried and rejected in AVRO-3601: C++ API header contains breaking include #1820.std::any
as the value type. This can be#include
d inCustomAttributes.hh
but eliminates type safety in the signature. The only concrete accepted value would likely bejson::Entity
-- though we could make it more sophisticated and also allow the various value types sans wrapper:std::string
,bool
,int64_t
,double
,json::Array
(akastd::vector<json::Entity>
), andjson::Object
(akastd::map<std::string, json::Entity>
). But this isn't really usable by external/user code, at least not for any composite values, since they aren't able to include the JSON headers and then produce valid values ofjson::Entity
.json::Entity
to some other structured type that is defined inCustomAttributes.hh
. This could possibly be a concretestd::variant
that allows the various options (and could usestd::monostate
to represent JSON null values). This introduces non-trivial conversion code. From a performance perspective, it could likely be better than converting to/from strings, but it's a non-trivial amount of new-new code to maintain, which didn't feel right.What is the purpose of the change
Fixes exception from C++ library when compiling schemas with non-string custom attributes.
Verifying this change
This change added tests and can be verified as follows:
Documentation