Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _patch.py #39341

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update _patch.py #39341

wants to merge 1 commit into from

Conversation

952446418
Copy link

The current implementation of the line splitting logic in the code does not handle non-ASCII characters properly. Specifically, the following line: line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8")) This line attempts to decode the element byte string using UTF-8 encoding. If the element contains non-ASCII characters that cannot be decoded, it will raise a UnicodeDecodeError.

To address this issue, we should add an error handling mechanism to the decode method. There are several options available:

errors="replace": Replace undecodable characters with a replacement character (usually �). errors="ignore": Ignore undecodable characters.
errors="backslashreplace": Replace undecodable characters with \x escape sequences. errors="surrogateescape": Save undecodable characters as surrogate characters for later recovery. For this pull request, I propose using errors="replace" or errors="ignore".

The current implementation of the line splitting logic in the code does not handle non-ASCII characters properly. Specifically, the following line:
line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8"))
This line attempts to decode the element byte string using UTF-8 encoding. If the element contains non-ASCII characters that cannot be decoded, it will raise a UnicodeDecodeError.

To address this issue, we should add an error handling mechanism to the decode method. There are several options available:

errors="replace": Replace undecodable characters with a replacement character (usually �).
errors="ignore": Ignore undecodable characters.
errors="backslashreplace": Replace undecodable characters with \x escape sequences.
errors="surrogateescape": Save undecodable characters as surrogate characters for later recovery.
For this pull request, I propose using errors="replace" or errors="ignore".
@github-actions github-actions bot added AI Model Inference Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference) Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Jan 22, 2025
Copy link

Thank you for your contribution @952446418! We will review the pull request and get back to you soon.

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Model Inference Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference) Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants