Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

define encoding StringIndex model #5712

Open
chunyu3 opened this issue Jan 23, 2025 · 0 comments
Open

define encoding StringIndex model #5712

chunyu3 opened this issue Jan 23, 2025 · 0 comments

Comments

@chunyu3
Copy link
Contributor

chunyu3 commented Jan 23, 2025

Some Azure services return substring offset & length values within a string. For example, the offset & length within a string to a name, email address, or phone number.
https://github.com/microsoft/api-guidelines/blob/vNext/azure/ConsiderationsForServiceDesign.md#returning-string-offsets--lengths-substrings
When the string is encoding in different format (UTF-8, UTF-16 or CodePoint ), the offset/length will be different.

In order make sure client SDK of different language can correct parse the string, service will return a model to indicate the offset/length for different encoding. e.g.

@doc("String index encoding model.")
model StringIndex {
  @doc("The offset or length of the substring in UTF-8 encoding")
  utf8: int32;

  @doc("""
    The offset or length of the substring in UTF-16 encoding.
    
    Primary encoding used by .NET, Java, and JavaScript.
    """)
  utf16: int32;

  @doc("""
    The offset or length of the substring in CodePoint encoding.
    
    Primary encoding used by Python.
    """)
  codePoint: int32;
}

https://github.com/microsoft/api-guidelines/blob/685e493d38f8a3875c22336dcd177f4b54dcfb23/azure/Guidelines.md#returning-string-offsets--lengths-substrings

There is a feature request Azure/autorest.csharp#4925 to wrap encoding stringIndex model to int, e.g.

// Currently
taggerOutput.Offset.Utf16 // int
taggerOutput.Offset //  {"utf8": 10, "utf16":10, "codePoint:10"} - StringIndex model

// Desired
taggerOutput.Offset // int -- While still supporting the serialization of the StringIndex model

To implement this feature, codegen need to recognize the model is StringIndex model.

  • option 1 : Typespec define a decorator to indicate that this is for offset/length indexing model for different encoding
  • option 2: TypeSpec define a common model for this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant