Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facing difficulty using Gemini Flash 2.0 with function calling #4878

Closed
gvijqb opened this issue Jan 2, 2025 · 21 comments · Fixed by #5122
Closed

Facing difficulty using Gemini Flash 2.0 with function calling #4878

gvijqb opened this issue Jan 2, 2025 · 21 comments · Fixed by #5122
Labels
0.2 Issues which are related to the pre 0.4 codebase needs-triage proj-extensions
Milestone

Comments

@gvijqb
Copy link

gvijqb commented Jan 2, 2025

What happened?

Whenever there's a function call with Gemini Flash 2.0 in autogen, I get this error:
Google GenAI exception occurred while calling Gemini API: 400 * GenerateContentRequest.tools[0].function_declarations[0].parameters.properties: should be non-empty for OBJECT type

What did you expect to happen?

It should be able to use the tool call properly.

How can we reproduce it (as minimally and precisely as possible)?

Just an assistantagent with a function call should be able to reproduce it.

AutoGen version

0.2

Which package was this bug in

Core

Model used

Gemini flash 2.0 exp

Python version

3.11

Operating system

Ubuntu 22.04

Any additional info you think would be helpful for fixing this bug

No response

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 4, 2025

Gemini has added support for openai style endpoint, can you try that without specifying the api_type?

@ekzhu ekzhu added the 0.2 Issues which are related to the pre 0.4 codebase label Jan 4, 2025
@gvijqb
Copy link
Author

gvijqb commented Jan 5, 2025

@ekzhu If I don't specify api_type then I am getting authentication error:

raise self._make_status_error_from_response(err.response) from None                                                                          
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: AIza********************************VQ. You can
 find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

I guess it is expecting it to be a openAI API key when I do not specify api_type as google.

I am using OAI Config List method like this:

OAI_CONFIG_FILE:

[
    {
        "model":"gemini-2.0-flash-exp",
        "api_key": "KEY",
        "api_type": "google"
    }
]

Runtime:

config_list_gemini_flash = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gemini-2.0-flash-exp"]
    }
)

gemini_flash_config = {
    "cache_seed": 42,
    "max_retries": 5,
    "config_list": config_list_gemini_flash,
    "timeout": 30000,
}

Is there any configuration issue in this workflow?

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 6, 2025

Based on https://ai.google.dev/gemini-api/docs/openai#:~:text=Gemini%20models%20are%20accessible%20using%20the%20OpenAI%20libraries,recommend%20that%20you%20call%20the%20Gemini%20API%20directly.

You need to set base_url field to: "https://generativelanguage.googleapis.com/v1beta/openai/", and also the API key to your Gemini's key.

@ekzhu ekzhu added the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label Jan 6, 2025
@gvijqb
Copy link
Author

gvijqb commented Jan 6, 2025

@ekzhu thanks I followed this approach but still getting an error.

Updated OAI_CONFIG_LIST:

[
{
        "model":"gemini-2.0-flash-exp",
        "base_url":"https://generativelanguage.googleapis.com/v1beta/openai/",
        "api_key": "MY_KEY"
    }
]

Getting this error:

openai.BadRequestError: Error code: 400 - [{'error': {'code': 400, 'message': 'Unable to submit request because it has an empty text parameter. Add a value to the parameter and try again. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini', 'status': 'INVALID_ARGUMENT'}}]

I guess there is some autogen/gemini integration error here as in how the text is being passed to Gemini endpoint?

@github-actions github-actions bot removed the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label Jan 6, 2025
@ekzhu
Copy link
Collaborator

ekzhu commented Jan 6, 2025

What is the code that led to this error? Gemini seems to reject an empty text parameter in a message.

@gvijqb
Copy link
Author

gvijqb commented Jan 6, 2025

What is the code that led to this error? Gemini seems to reject an empty text parameter in a message.

I guess I'll have to try again to keep a log of the messages. Will update you shortly.

@gvijqb
Copy link
Author

gvijqb commented Jan 7, 2025

@ekzhu I am just trying to use gemini 2.0 flash exp for speaker selection. Here's my code:

config_list_gemini_flash = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gemini-2.0-flash-exp"]
    }
)

gemini_flash_config = {
    "cache_seed": 42,
    "max_retries": 5,
    "config_list": config_list_gemini_flash,
    "timeout": 30000,
}

def setup_groupchat(self):
   self.groupchat = autogen.GroupChat(
        agents=[],
        messages=[],
        max_round=250,
        select_speaker_prompt_template = "Read the above conversation. Then select the next role from {agentlist} to play. Only return the role.",)

   manager = autogen.GroupChatManager(groupchat=self.groupchat, llm_config=gemini_flash_config)

Still getting the same error.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 8, 2025

Where is the rest of the code? It looks like there is no agents?

BTW, we are releasing v0.4 soon this week. Your problem might be fixed by simply upgrading to v0.4. Migration doc: https://aka.ms/autogen-migrate.

@gvijqb
Copy link
Author

gvijqb commented Jan 8, 2025

Where is the rest of the code? It looks like there is no agents?

BTW, we are releasing v0.4 soon this week. Your problem might be fixed by simply upgrading to v0.4. Migration doc: https://aka.ms/autogen-migrate.

I had just redacted the agent names for sharing here. The code is working but the gemini flash is the only one causing issue as I mentioned above. Just for clarity I am not using Gemini flash anywhere other than speaker selection. When I use it for an agent that has a function call then the issue appears of function calling. Otherwise as a speaker selector, it is working fine.

Here's an example of how I am creating the agent:

def create_agent(name, system_message, llm_config):
    return autogen.AssistantAgent(name = name, system_message = system_message, llm_config = llm_config)

self.executor = autogen.UserProxyAgent(name="Executor",system_message=executor_system_message,human_input_mode="NEVER",code_execution_config={"last_n_messages": 2,"executor": self.executor},)
self.agentOne = create_agent("AgentOne", system_message = agent_one_system_message, llm_config = gemini_flash_config)

autogen.register_function(retreive_from_internet, caller=self.agentOne, executor=self.executor, name="retreive_from_internet", description="Search internet and find context from internet.")

def setup_groupchat(self):
   self.groupchat = autogen.GroupChat(
        agents=[self.agentOne, self.executor],
        messages=[],
        max_round=250,
        select_speaker_prompt_template = "Read the above conversation. Then select the next role from {agentlist} to play. Only return the role.",)

   manager = autogen.GroupChatManager(groupchat=self.groupchat, llm_config=gemini_flash_config)

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 8, 2025

I don't see any problem in setting up the agents. Likely something to do with the messages that got sent to Gemini endpoint may contain an empty text field. It would be good to root out which message is it by debugging and inspecting the output.

How did the group chat start? Was there any initial message?

@gvijqb
Copy link
Author

gvijqb commented Jan 8, 2025

@ekzhu Yes, I use initiate_chat function for starting groupchat with input message. Example:

manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=gemini_flash_config)

user_proxy.initiate_chat(manager, message=INITIAL_MESSAGE)

The message does pass through and works as expected for agents with function calling when the LLM is not Gemini flash. But when using any Gemini model as LLM for agents, I am observing this error.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 8, 2025

Okay, thanks for narrowing it down. So, the issue happens when it is (1) gemini-flash and (2) when function calling is involved?

This is likely due to the function calling messages involved an empty field, which Gemini flash rejects. If we can have a minimal reproduction with function calling, it would be great. If you want to continue working on v0.2.

For v0.4, it will not have this problem, because the function calls happen within each agent in a group chat.

@gvijqb
Copy link
Author

gvijqb commented Jan 9, 2025

Okay, thanks for narrowing it down. So, the issue happens when it is (1) gemini-flash and (2) when function calling is involved?

This is likely due to the function calling messages involved an empty field, which Gemini flash rejects. If we can have a minimal reproduction with function calling, it would be great. If you want to continue working on v0.2.

For v0.4, it will not have this problem, because the function calls happen within each agent in a group chat.

Yes, let me try to reproduce with a minimal version. Although I suspect that the issue would still be prevalent as the message being empty is not under my control. Or I don't know how we can alter that for autogen v0.2.

v0.4 sounds lucrative but the core issue is that there are so many dependencies and migration would be super time consuming!

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 10, 2025

Having function calls shared in a group chat will cause more problems for you in the future -- unless all your models are OpenAI models. This is a key pain point we have seen over many times. Thus v0.4 is my suggestion. It actually has less dependencies if you look into it. You can use the OpenAI client for Gemini.

Migration guide: https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/migration-guide.html

For v0.2, you can debug into the messages exchanged between agents during the group chat and find the message that's the culprit, and then we can address that in the client.

@gvijqb
Copy link
Author

gvijqb commented Jan 10, 2025

@ekzhu
Thanks for sharing. Interestingly in a small implementation, the function call with gemini ended up working. Not sure what changed other than the number of agents being less.

Also one reason I am not sure if we can move to 0.4v is because we have a custom definition of userproxy class for code execution and so far from the documentation of v0.4 we think that we won't have access to underlying classes and thus won't be able to customize them as per our needs.

Please suggest if this is not the case.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 10, 2025

we won't have access to underlying classes

You can create custom agents: https://microsoft.github.io/autogen/dev/user-guide/agentchat-user-guide/tutorial/custom-agents.html

For code executor agent, you can subclass: https://microsoft.github.io/autogen/dev/reference/python/autogen_agentchat.agents.html#autogen_agentchat.agents.CodeExecutorAgent, or take it and implement your own custom agent, as the class is super slim.

@gvijqb
Copy link
Author

gvijqb commented Jan 18, 2025

we won't have access to underlying classes

You can create custom agents: https://microsoft.github.io/autogen/dev/user-guide/agentchat-user-guide/tutorial/custom-agents.html

For code executor agent, you can subclass: https://microsoft.github.io/autogen/dev/reference/python/autogen_agentchat.agents.html#autogen_agentchat.agents.CodeExecutorAgent, or take it and implement your own custom agent, as the class is super slim.

Thanks, we are looking at these docs.

@guycalledseven
Copy link

@ekzhu Unfortunately there is almost the same issue with gemini-2.0-flash-exp on 0.4 branch as well.
First thing I got when testing gemini with handoff/roundrobin in 0.4.2 got me this error and lead me to this ticket for 0.2 branch. 😓 I couldn't get single tool call to work.

After POST to chat/completions finish_reason stop happens.

DEBUG:openai._base_client:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "200 OK"

INFO:autogen_core.events:{"type": "LLMCall", "messages": [{"content": "Use tools to solve tasks.", "role": "system"}, {"content": "Find information on AutoGen", "role": "user", "name": "user"}], "response": {"id": null, "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": null, "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": [{"id": "", "function": {"arguments": "{\"query\":\"AutoGen\"}", "name": "web_search"}, "type": "function"}]}}], "created": 1737330953, "model": "gemini-2.0-flash-exp", "object": "chat.completion", "service_tier": null, "system_fingerprint": null, "usage": {"completion_tokens": 6, "prompt_tokens": 50, "total_tokens": 56, "completion_tokens_details": null, "prompt_tokens_details": null}}, "prompt_tokens": 50, "completion_tokens": 6, "agent_id": null}

While running the same example on local ollama via openai compatible endpoint calls the tool:

DEBUG:openai._base_client:HTTP Request: POST http://localhost:11434/v1/chat/completions "200 OK"

INFO:autogen_core.events:{"type": "LLMCall", "messages": [{"content": "Use tools to solve tasks.", "role": "system"}, {"content": "Find information on AutoGen", "role": "user", "name": "user"}], "response": {"id": "chatcmpl-871", "choices": [{"finish_reason": "tool_calls", "index": 0, "logprobs": null, "message": {"content": "", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": [{"id": "call_5aiuzvjb", "function": {"arguments": "{\"query\":\"AutoGen\"}", "name": "web_search"}, "type": "function", "index": 0}]}}], "created": 1737330910, "model": "llama3.1:8b", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_ollama", "usage": {"completion_tokens": 18, "prompt_tokens": 165, "total_tokens": 183, "completion_tokens_details": null, "prompt_tokens_details": null}}, "prompt_tokens": 165, "completion_tokens": 18, "agent_id": null}

DEBUG:autogen_agentchat.events:source='assistant' models_usage=RequestUsage(prompt_tokens=165, completion_tokens=18) content=[FunctionCall(id='call_5aiuzvjb', arguments='{"query":"AutoGen"}', name='web_search')] type='ToolCallRequestEvent'

DEBUG:autogen_agentchat.events:source='assistant' models_usage=None content=[FunctionExecutionResult(content='AutoGen is a programming framework for building multi-agent applications.', call_id='call_5aiuzvjb')] type='ToolCallExecutionEvent'

example configuration

model_client = OpenAIChatCompletionClient(
    model="gemini-2.0-flash-exp", 
    base_url="https://generativelanguage.googleapis.com/v1beta/openai",
    model_capabilities={
        "json_output": True,
        "vision": True,
        "function_calling": True,
    },
)   

agent = AssistantAgent(
    name="assistant",
    model_client=model_client,
    tools=[web_search],
    system_message="Use tools to solve tasks.",
)

response = await agent.on_messages(
    [TextMessage(content="Find information on AutoGen", source="user")],
    cancellation_token=CancellationToken(),
)

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 20, 2025

@guycalledseven ah. okay. I hope we can address this shortly in the next v0.4 release.

After POST to chat/completions finish_reason stop happens.

DEBUG:openai._base_client:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "200 OK"

INFO:autogen_core.events:{"type": "LLMCall", "messages": [{"content": "Use tools to solve tasks.", "role": "system"}, {"content": "Find information on AutoGen", "role": "user", "name": "user"}], "response": {"id": null, "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": null, "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": [{"id": "", "function": {"arguments": "{\"query\":\"AutoGen\"}", "name": "web_search"}, "type": "function"}]}}], "created": 1737330953, "model": "gemini-2.0-flash-exp", "object": "chat.completion", "service_tier": null, "system_fingerprint": null, "usage": {"completion_tokens": 6, "prompt_tokens": 50, "total_tokens": 56, "completion_tokens_details": null, "prompt_tokens_details": null}}, "prompt_tokens": 50, "completion_tokens": 6, "agent_id": null}

The log indicates the culprit is the "finish_reason". When it is "stop", it doesn't parse the "tool_calls" content.

if choice.finish_reason == "tool_calls":
assert choice.message.tool_calls is not None
assert choice.message.function_call is None
# NOTE: If OAI response type changes, this will need to be updated
content = [
FunctionCall(
id=x.id,
arguments=x.function.arguments,
name=normalize_name(x.function.name),
)
for x in choice.message.tool_calls
]
finish_reason = "function_calls"
else:
finish_reason = choice.finish_reason
content = choice.message.content or ""
logprobs: Optional[List[ChatCompletionTokenLogprob]] = None

I think we can modify this to make it possible to handle tool calls by detecting the tool_calls field.

Generally speaking, many hosted APIs, including Gemini, may claim to be OpenAI-compatible, but they often different in details.

cc @jackgerrits

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 21, 2025

@guycalledseven a fix is ready: #5122

@guycalledseven
Copy link

@ekzhu I see tools being called finally. and your warning is present as well:

.../autogen_agentchat/agents/_assistant_agent.py:386: UserWarning: Finish reason mismatch: stop != tool_calls when tool_calls are present. Finish reason may not be accurate. This may be due to the API used that is not returning the correct finish reason.
  model_result = await self._model_client.create(
.../autogen_agentchat/agents/_assistant_agent.py:386: UserWarning: Both tool_calls and content are present in the message. This is unexpected. content will be ignored, tool_calls will be used.
  model_result = await self._model_client.create(

Will test this more in depth later on, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which are related to the pre 0.4 codebase needs-triage proj-extensions
Projects
None yet
3 participants