Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Mising tool outputss Error when using Azure OpenAI Assistants, when the Assistant planner calls the same tool multiple times in a run #1705

Open
NikhilB95 opened this issue Jun 3, 2024 · 9 comments · May be fixed by #2228
Labels
bug Something isn't working P1 Python Change/fix applies to Python. If all three, use the 'JS & dotnet & Python' label

Comments

@NikhilB95
Copy link

Language

Python

Version

latest

Description

When using Azure OpenAI Assistants, with custom tools, the teams bot raises a error (shown below) in specific situations when the same tool is called multiple times.

Error: Traceback (most recent call last): File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/botbuilder/core/bot_adapter.py", line 174, in run_pipeline return await self._middleware.receive_activity_with_status( File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/botbuilder/core/middleware_set.py", line 69, in receive_activity_with_status return await self.receive_activity_internal(context, callback) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/botbuilder/core/middleware_set.py", line 79, in receive_activity_internal return await callback(context) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/app.py", line 663, in on_turn await self._start_long_running_call(context, self._on_turn) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/app.py", line 813, in _start_long_running_call return await func(context) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/app.py", line 756, in _on_turn is_ok = await self._ai.run(context, state) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/ai/ai.py", line 187, in run return await self.run(context, state, started_at, step) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/ai/ai.py", line 187, in run return await self.run(context, state, started_at, step) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/ai/ai.py", line 143, in run plan = await self.planner.continue_task(context, state) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/ai/planners/assistants_planner.py", line 187, in continue_task return await self._submit_action_results(state) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/teams/ai/planners/assistants_planner.py", line 279, in _submit_action_results run = await self._client.beta.threads.runs.submit_tool_outputs( File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/openai/resources/beta/threads/runs/runs.py", line 2979, in submit_tool_outputs return await self._post( File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/openai/_base_client.py", line 1790, in post return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/openai/_base_client.py", line 1493, in request return await self._request( File "/tmp/8dc83d67fdc82b2/antenv/lib/python3.11/site-packages/openai/_base_client.py", line 1584, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'error': {'message': "Expected tool outputs for call_ids ['call_dSFAdfcF9CLsB6LutGJleiFJ', 'call_kxqGFm4LegeYA80wdG4nX0q4', 'call_RzLxCptvGVXCD298Klh170xX'], got ['call_RzLxCptvGVXCD298Klh170xX']", 'type': 'invalid_request_error', 'param': None, 'code': None}}

Reproduction Steps

1. Setup custom tools via function calling
2. Prompt the AssistantPlanner to call the same tool multiple times in a single prompt
3. Assistant Raises a error that tool outputs are missing

I believe the error stems from the _generate_plan_from_tools function in assistant_planner.py
Here, the tool_map Dict is initiated with the dictionary keys as the function name. This raises a issue where the same function output is overwritten in tools_map in case of multiple runs of the same function.
This further effects _submit_action_results, where the error stems from.
@NikhilB95 NikhilB95 added the bug Something isn't working label Jun 3, 2024
@corinagum corinagum added the Python Change/fix applies to Python. If all three, use the 'JS & dotnet & Python' label label Jun 11, 2024
@lilyydu
Copy link
Contributor

lilyydu commented Jun 17, 2024

Hi @NikhilB95, do you mind indenting the stack trace line-by-line so that it is easier to see?

Do you know which call_id is associated to the function you are trying to call multiple times?

I see this error at the bottom- it looks like the outputs of the first two calls were not returned?
Error code: 400 - {'error': {'message': "Expected tool outputs for call_ids ['call_dSFAdfcF9CLsB6LutGJleiFJ', 'call_kxqGFm4LegeYA80wdG4nX0q4', 'call_RzLxCptvGVXCD298Klh170xX'], got ['call_RzLxCptvGVXCD298Klh170xX']", 'type': 'invalid_request_error', 'param': None, 'code': None}}

@lilyydu
Copy link
Contributor

lilyydu commented Jun 25, 2024

Closing due to inactivity, please re-open if support is required.

@lilyydu lilyydu closed this as completed Jun 25, 2024
@andres-swax
Copy link

andres-swax commented Oct 29, 2024

EDIT: Found the root cause of the issue. It happens when the same tool is called more than once on the same run. See proposed fix on next post.

Hello @lilyydu @corinagum @NikhilB95. I am facing the very same issue. Was about to open an issue when found this one, dormant though.

I get the exact same error (different callIDs of course).

This can be easily reproduced as follows (pseudocode):

  1. Create an assistant on Azure AI Studio -> Assistants Playground, following the source code for the Assistants sample (I would need to search how I found it - I think it is on the current as of 2024/10/20 version of the Teams AI Assistants documentation). I am pasting the complete code of the assistant definition at the end of this post.

  2. Create a simple sample app from Teams Toolit [version 5.10 current as of 2024/10/28] as follows: Teams Toolkit -> New Project -> Custom Engine Agent -> AI Agent -> Build with Assistants API (Preview) -> Python.

  3. Once the assistant is running test it by asking for

3a. something that will trigger two calls to sequential tools. It runs them one after the other so the output of one becomes the input of another. It will work. Example:
*USER* Give me the weather of the city nicknamed The Golden City
--- Tool call to getNickname(The Golden City) --> tool returns to LLM San Francisco
--- Tool call to getWeather(San Francisco) --> tool returns to LLM 44 C
*LLM for example* San Francisco is a little warm today, it is 44 degrees Celsius

3b. something that will triger two calls to the same or different tools where no input is dependent on the output of another call. The stack trace and error message make clear that they were called in parallel, or at least the LLM thinks they were, or I think anyway. It will fail. 100% repeatable:

*USER* Give me the name of both cites nicknamed Sin City and The Golden City
--- Tool call to getNickname(Sin City) --> tool returns a proper response
--- Tool call to getNickname(The Golden City) --> not even certain tool managed to return anything. I did not see evidence while debugging on VS Code that this call even arrived from the LLM

Apparently at this moment, the return value of call 1 with the correct call ID reaches the LLM but the LLM is expecting the results of BOTH tool calls on the same response [http callback?], so it throws a HTTP error 400 and there it goes: From this point forward, the Teams agent will not reply to any requests/posts so it seems mute, stunned or just dead until the run that triggered this 400 error expires, then it may or may not process the other inputs I supposed based on whether they have expired or not.
Another example: How is the weather in Vegas and which city is called The Golden City?, triggers the same problem.


I experienced this ever since I started working with the Assistants version of the template...and hoped the issue would just go away [yeah, right]. Versions of the conversation such as: Where should I go this weekend, Los Angeles or Las Vegas? I prefer warm weather. or something like this which would trigger this scenario.

This is an example of the errors I get. Notice that the error is shown 3 times, I suppose while the exception bubbles up. None of the text below is mine or my code's:

[TURN ERROR] Unhandled error: Error code: 400 - {'error': {'message': "Expected tool outputs for call_ids ['call_5ceIbNb7pLs3R7aNMdNXDxmq', 'call_bq7VJMlniIUVR8Pil2tk9unH'], got ['call_bq7VJMlniIUVR8Pil2tk9unH']", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error parameters/arguments:
  Error code: 400 - {'error': {'message': "Expected tool outputs for call_ids ['call_5ceIbNb7pLs3R7aNMdNXDxmq', 'call_bq7VJMlniIUVR8Pil2tk9unH'], got ['call_bq7VJMlniIUVR8Pil2tk9unH']", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Error code: 400 - {'error': {'message': "Expected tool outputs for call_ids ['call_5ceIbNb7pLs3R7aNMdNXDxmq', 'call_bq7VJMlniIUVR8Pil2tk9unH'], got ['call_bq7VJMlniIUVR8Pil2tk9unH']", 'type': 'invalid_request_error', 'param': None, 'code': None}}

My conclusion is that the LLM calls the tool -at least the first invocation reaches the tool implementation in Python- and when that tool returns [meaning posts response back?], the LLM throws a 400 error because it was expecting the return value from both calls on the same request.

It is not clear to me if the second call from the LLM to the tool entry point ever went out or if it was received. I did not see evidence of it reaching the entry point.


Code for the Assistant definition, copied from the Asistants Playground on Azure AI Studio:

{
  "tools": [
    {
      "type": "code_interpreter"
    },
    {
      "type": "function",
      "function": {
        "name": "getCurrentWeather",
        "description": "Get the weather in location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state e.g. Seattle, WA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "c",
                "f"
              ]
            }
          },
          "required": [
            "location"
          ]
        },
        "strict": false
      }
    },
    {
      "type": "function",
      "function": {
        "name": "getNickname",
        "description": "Get the nickname of a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state e.g. Seattle, WA"
            }
          },
          "required": [
            "location"
          ]
        },
        "strict": false
      }
    }
  ],
  "name": "Cities and Weather",
  "instructions": "You are an intelligent bot that can write and run code to answer math questions use the provided functions to answer questions",
  "model": "az-oai-gpt-4o",
  "tool_resources": {
    "code_interpreter": {
      "file_ids": []
    }
  },
  "temperature": 1,
  "top_p": 1
}

Code for the tool implementations (called from the LLM)



@bot_app.ai.action("getCurrentWeather")
async def get_current_weather(context: ActionTurnContext, state: TurnState):
    weatherData = {
        'San Francisco, CA': { 'f': '71.6F', 'c': '22C', },
        'Los Angeles, CA': { 'f': '75.2F', 'c': '24C', },
        'New York, NY': { 'f': '44.2F', 'c': '17C', },
    }
    location = context.data.get("location")
    if not weatherData.get(location): return f"No weather data for ${location} found"
    unit = context.data['unit'] if 'unit' in context.data else 'f'
    wdl = weatherData[location] if location in weatherData else {}
    if not wdl : return 'not found'
    else: return wdl[unit]



@bot_app.ai.action("getNickname")
async def get_nickname(context: ActionTurnContext, state: TurnState):
    nicknames = {
        'San Francisco, CA': 'The Golden City',
        'Los Angeles': 'LA',
    }
    location = context.data.get("location")
    resp = str(nicknames.get(location)) if ( nicknames and nicknames.get(location) ) else f"No nickname for ${location} found"
    return resp

@andres-swax
Copy link

andres-swax commented Oct 30, 2024

I have found the problem, it was being triggered when the same tool was being called more than once.

The Teams AI library contains a dictionary of tools being called toolmap but it was keeping track only of the tool name, and a tool name can be called multiple times on the same run (each with a call_id). So, when preparing the response for delivery, when it was iterating over the actual results, it would overwrite the result stored on the previous call_id and only the last one would survive/be returned. So, when it would post back the results to the LLM, then the LLM would not receive all results.

I have fixed it but I don't know how to submit the actual fix to the correct repository, or the protocol. Also, I fixed this scenario but did not explore any kind of impact anywhere else, testing included. Current tests should pass though, because functionality was not changed.

The fix consists on changing a variable from containing a list of calls tool_map : Dict to containing a list of runs (call_id) for each function (tool name/action) being called tool_map : dict[str,list] where the list corresponds to each call_id of the same tool (str). This seems to have been the original intent of the code.

runsList[callId] became runsList[toolName][callId]

Fix in two places:
teams_ai-1.4.1.dist-info
|--------------------------> teams/ai/ai.py
--------------------------> teams/ai/planners/assistants_planner.py

This is the before and after exported as a git diff. Not certain if this is the best way to pass this info though.

diff --git a/src/ai_toolkit_fixes/teams.ai.ap.py b/src/ai_toolkit_fixes/teams.ai.ap.py
index 562dbfd..6bb0104 100644
--- a/src/ai_toolkit_fixes/teams.ai.ap.py
+++ b/src/ai_toolkit_fixes/teams.ai.ap.py
@@ -171,7 +171,8 @@ class AI(Generic[StateT]):
                     # Set output for action call
                     if command.action_id:
                         loop = True
-                        state.temp.action_outputs[command.action_id] = output or ""
+                        if not command.action in state.temp.action_outputs: state.temp.action_outputs[command.action] = {}       #FIXED
+                        state.temp.action_outputs[command.action][command.action_id] = output or ""                              #FIXED
                     else:
                         loop = len(output) > 0
                         state.temp.action_outputs[command.action] = output
diff --git a/src/ai_toolkit_fixes/teams.ai.planners.assistants_planner.py b/src/ai_toolkit_fixes/teams.ai.planners.assistants_planner.py
index 0df36ca..ad7c0fb 100644
--- a/src/ai_toolkit_fixes/teams.ai.planners.assistants_planner.py
+++ b/src/ai_toolkit_fixes/teams.ai.planners.assistants_planner.py
@@ -264,16 +264,17 @@ class AssistantsPlanner(Generic[StateT], _UserAgent, Planner[StateT]):
 
         # Map the action outputs to tool outputs
         action_outputs = state.temp.action_outputs
-        tool_map = state.get(SUBMIT_TOOL_OUTPUTS_MAP)
+        tool_map : dict[str,list] = state.get(SUBMIT_TOOL_OUTPUTS_MAP)                                                        #FIXED
         tool_outputs: List[ToolOutput] = []
 
-        for action in action_outputs:
-            output = action_outputs[action]
-            if tool_map:
-                tool_call_id = tool_map[action] if action in tool_map else None
-                if tool_call_id is not None:
-                    # Add required output only
-                    tool_outputs.append(ToolOutput(tool_call_id=tool_call_id, output=output))
+        if tool_map:                                                                                                          #FIXED
+            for action in action_outputs:                                                                                     #unchanged
+                if action in tool_map:                                                                                        #FIXED
+                    for tool_call_id in tool_map[action]:                                                                     #FIXED
+                        output = action_outputs[action][tool_call_id] if tool_call_id in action_outputs[action] else None     #FIXED
+                        if output is not None:                                                                                #FIXED
+                            # Add required output only
+                            tool_outputs.append(ToolOutput(tool_call_id=tool_call_id, output=output))
 
         # Submit the tool outputs
         if assistants_state.thread_id and assistants_state.run_id:
@@ -329,12 +330,14 @@ class AssistantsPlanner(Generic[StateT], _UserAgent, Planner[StateT]):
 
     def _generate_plan_from_tools(self, state: TurnState, required_action: RequiredAction) -> Plan:
         plan = Plan()
-        tool_map: Dict = {}
+        tool_map: dict[str,list] = {}                                                                     #FIXED
         for tool_call in required_action.submit_tool_outputs.tool_calls:
-            tool_map[tool_call.function.name] = tool_call.id
+            if not tool_call.function.name in tool_map : tool_map[tool_call.function.name] = []           #FIXED
+            tool_map[tool_call.function.name].append(tool_call.id)                                        #FIXED
             plan.commands.append(
                 PredictedDoCommand(
                     action=tool_call.function.name,
+                    action_id=tool_call.id,                                                               #FIXED
                     parameters=json.loads(tool_call.function.arguments),
                 )
             )

@singhk97 singhk97 reopened this Oct 30, 2024
@singhk97
Copy link
Collaborator

Re-opening. Will try to reproduce it soon.

@andres-swax
Copy link

andres-swax commented Oct 30, 2024

@singhk97 Hi! While you reopened I was copying here and pasting there, and created a new bug report report 😁 at the very same time I just saw.
Do you want to merge those or close either? Whichever way it's easier. Here is what I created pretty much at the same time.

https://github.com/microsoft/teams-ai/issues/2154

@andres-swax
Copy link

andres-swax commented Jan 6, 2025

@corinagum Hello, is the fix to this scheduled sometime in the near future? I wrote the fix but I completely and hopelessly got lost creating the tests - no idea. So there is this open PR that contains what has to be done to fix the issue. Been open for a while already.

thanks!

EDIT: Forgot to mention, this issue is kind of a major one since whenever the AI decides to trigger multiple calls to the same tool on a single round it will always fail.

For context for example, Which will be warmer this week, Atlanta and Denver? will break 100% of the time - so definitely a major deal.

@corinagum
Copy link
Collaborator

@BMS-geodev could you leave a comment on this issue?

@BMS-geodev
Copy link
Collaborator

assignment comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 Python Change/fix applies to Python. If all three, use the 'JS & dotnet & Python' label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants