-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(wren-ai-service): chart eval #1192
Conversation
Caution Review failedThe pull request is closed. WalkthroughThis pull request introduces enhancements to the chart generation and evaluation workflow in the Wren AI service. Key modifications include updating the Changes
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (8)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (8)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)
121-121
: Consider adding type hints for the return value.The preprocess_data function could benefit from more specific type hints for its return value.
- return chart_data_preprocessor.run(data)["sample_data"] + sample_data: Dict[str, Any] = chart_data_preprocessor.run(data)["sample_data"] + return sample_datawren-ai-service/eval/llm_trace_analysis/utils.py (5)
15-15
: Usefind_dotenv
to dynamically locate the.env
fileThe hardcoded path in
load_dotenv("eval/llm_trace_analysis/.env", override=True)
may fail if the script is run from a different working directory. To ensure the.env
file is found regardless of the execution context, consider usingfind_dotenv()
.Apply this diff to use
find_dotenv
:-import os -from dotenv import load_dotenv +import os +from dotenv import load_dotenv, find_dotenv -load_dotenv("eval/llm_trace_analysis/.env", override=True) +load_dotenv(dotenv_path=find_dotenv(), override=True)
49-88
: Limit concurrent API calls to avoid rate limitingIn
get_all_traces
andget_all_observations
, usingasyncio.gather(*tasks)
to fetch all pages concurrently can lead to a large number of simultaneous API requests if there are many pages. This might overwhelm the API server or exceed rate limits.Consider limiting the number of concurrent requests using
asyncio.Semaphore
:async def get_all_traces( client: AsyncFernLangfuse, name: Optional[str] = None, from_timestamp: Optional[datetime] = None, to_timestamp: Optional[datetime] = None, release: Optional[str] = None, + max_concurrency: int = 5, ) -> List[TraceWithDetails]: # Get first page to determine total pages first_page = await client.trace.list( name=name, page=1, from_timestamp=from_timestamp, to_timestamp=to_timestamp, release=release, ) # Create semaphore + semaphore = asyncio.Semaphore(max_concurrency) # Define a wrapper to limit concurrency + async def fetch_page(page: int): + async with semaphore: + return await client.trace.list( + name=name, + page=page, + from_timestamp=from_timestamp, + to_timestamp=to_timestamp, + release=release, + ) # Create tasks for all remaining pages tasks = [ - client.trace.list( - name=name, - page=page, - from_timestamp=from_timestamp, - to_timestamp=to_timestamp, - release=release, - ) + fetch_page(page) for page in range(2, first_page.meta.total_pages + 1) ] # Gather all pages with limited concurrency all_responses = [first_page] if tasks: # Only gather if there are additional pages all_responses.extend(await asyncio.gather(*tasks))Apply similar changes to
get_all_observations
.Also applies to: 96-136
37-40
: Provide specific details in error messages for missing environment variablesThe current error message lists all possible missing variables, even if only some are missing. To aid debugging, specify exactly which environment variables are not set.
Apply this diff to improve the error message:
if not public_key or not secret_key or not host: + missing_vars = [] + if not public_key: + missing_vars.append("LANGFUSE_PUBLIC_KEY") + if not secret_key: + missing_vars.append("LANGFUSE_SECRET_KEY") + if not host: + missing_vars.append("LANGFUSE_HOST") raise ValueError( - "Missing required Langfuse environment variables: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST" + f"Missing required Langfuse environment variables: {', '.join(missing_vars)}" )
85-85
: Simplify data aggregation by using response data directlyThere's no need to re-instantiate
FetchTracesResponse
orFetchObservationsResponse
when you already have the response objects. This can simplify the code and improve readability.Apply this diff to streamline the aggregation:
for response in all_responses: - traces.extend(FetchTracesResponse(data=response.data, meta=response.meta).data) + traces.extend(response.data)for response in all_responses: - observations.extend( - FetchObservationsResponse(data=response.data, meta=response.meta).data - ) + observations.extend(response.data)Also applies to: 132-134
78-81
: Handle cases when there is only one page of resultsThe current logic correctly skips
asyncio.gather
when there are no additional pages. However, explicitly handling whentotal_pages
is 1 can improve code clarity.Consider adding a comment or adjusting the logic for readability:
# Gather all pages concurrently if there is more than one page if first_page.meta.total_pages > 1: tasks = [ # tasks definition... ] # Gather additional pages all_responses.extend(await asyncio.gather(*tasks)) # Else, proceed with only the first pageAlso applies to: 125-128
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (2)
264-275
: Fetch LLM options dynamically.Hardcoding LLM versions in the UI will require frequent updates. Consider fetching these options dynamically from a configuration service or environment variables.
Consider this approach:
+def get_available_llms(): + try: + response = requests.get( + f"{os.getenv('WREN_AI_SERVICE_BASE_URL')}/v1/llms", + timeout=30, + ) + response.raise_for_status() + return response.json()["models"] + except Exception as e: + st.error(f"Failed to fetch LLM options: {e}") + return ["gpt-4o"] # Fallback to default st.multiselect( "Select LLMs", - options=[ - "gpt-4o", - "gpt-4o-2024-08-06", - "gpt-4o-mini", - "gpt-4o-mini-2024-07-18", - ], + options=get_available_llms(), key="llms_input", default=st.session_state["llms"], on_change=on_change_llms, )
191-219
: Improve state management structure and type safety.The current state initialization is scattered and lacks type safety. Consider using a more structured approach with type hints.
Consider this improvement:
+from dataclasses import dataclass +from typing import TypedDict, Optional, Set + +@dataclass +class ChartState: + chart_traces: List[TraceWithDetails] + chart_spans: List[ObservationsView] + chart_observations: List[ObservationsView] + load_num: int = 10 + project_id: str = "" + chart_types: Set[str] = set() + llms: Set[str] = set() + language: str = "English" + skip_empty_chart: bool = False + only_empty_chart: bool = False + release: str = "" + rerun_chart_data_results: dict = field(default_factory=dict) + +def initialize_state() -> None: + if "state" not in st.session_state: + st.session_state.state = ChartState( + chart_traces=[], + chart_spans=[], + chart_observations=[], + ) async def main(): st.set_page_config(layout="wide") st.title("Chart Evaluation") - if "chart_traces" not in st.session_state: - st.session_state["chart_traces"] = [] - # ... more state initialization ... + initialize_state()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
wren-ai-service/Justfile
(1 hunks)wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py
(1 hunks)wren-ai-service/eval/llm_trace_analysis/report_gen.py
(1 hunks)wren-ai-service/eval/llm_trace_analysis/utils.py
(1 hunks)wren-ai-service/src/pipelines/generation/chart_adjustment.py
(2 hunks)wren-ai-service/src/pipelines/generation/chart_generation.py
(8 hunks)wren-ai-service/src/pipelines/generation/utils/chart.py
(2 hunks)wren-ai-service/src/web/v1/services/chart.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: pytest
- GitHub Check: Analyze (python)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (13)
wren-ai-service/Justfile (2)
71-71
: LGTM! Good practice using module syntax.The change to use module syntax (
python -m
) instead of direct script execution is a good practice as it ensures proper module resolution and maintains consistency with Python's import system.
73-74
: LGTM! Well-structured Streamlit target.The new
chart-eval
target follows the established pattern for Streamlit applications in this Justfile (consistent withcurate_eval_data
anddemo
targets).wren-ai-service/src/web/v1/services/chart.py (3)
22-23
: LGTM! New attributes enhance flexibility.The new attributes provide good flexibility:
data
allows direct data injection without SQL executionremove_data_from_chart_schema
controls data retention in schema
136-136
: LGTM! Consistent parameter passing.The parameters are correctly passed to the chart_generation pipeline, maintaining consistency with the new attributes.
Also applies to: 138-138
119-129
: Verify error handling for invalid data format.The conditional data handling looks good, but we should ensure proper error handling for invalid data formats.
Run this script to check for error handling in similar implementations:
✅ Verification successful
Input validation and error handling are properly implemented
The code has comprehensive error handling through:
- Pydantic models for automatic data validation
- Status tracking ("fetching" -> "generating" -> "failed")
- Error metadata capture in the results dictionary
- Proper handling of both direct data and SQL-fetched data paths
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for error handling patterns in data processing rg -A 5 "try:|except:|raise" "wren-ai-service/src/web/v1/services/"Length of output: 9262
Script:
#!/bin/bash # Get complete error handling context from chart.py rg -A 15 -B 5 "try:|except:|raise" "wren-ai-service/src/web/v1/services/chart.py"Length of output: 781
Script:
#!/bin/bash # Get extended context to see complete try-except blocks rg -A 30 -B 10 "try:|except:|raise" "wren-ai-service/src/web/v1/services/chart.py"Length of output: 1567
Script:
#!/bin/bash # Search for ChartResultResponse class and its usage ast-grep --pattern 'class ChartResultResponse' # Search for validation related code rg -B 5 -A 10 "validator|@validate|@dataclass|pydantic" "wren-ai-service/src/web/v1/services/"Length of output: 19425
wren-ai-service/src/pipelines/generation/chart_generation.py (3)
75-78
: LGTM! Clean data handling implementation.The conditional data removal is well-implemented with a clear boolean check.
152-154
: LGTM! Proper parameter propagation.The parameters are correctly propagated through the pipeline stages.
205-205
: LGTM! Consistent default value.The optional parameter is properly defined with a sensible default value that matches the ChartRequest class.
Also applies to: 215-215
wren-ai-service/src/pipelines/generation/chart_adjustment.py (2)
141-141
: LGTM! Simplified data access.The direct return of sample_data aligns with the changes in chart.py and improves code clarity.
159-159
: LGTM! Consistent data handling.The prompt function correctly uses the preprocessed data, maintaining consistency with other changes.
wren-ai-service/src/pipelines/generation/utils/chart.py (3)
258-258
: LGTM! Improved type safety.The output type is now more specific and better reflects the actual return value.
263-263
: LGTM! Simplified data access pattern.The direct access to columns and data improves code readability and reduces nesting.
Also applies to: 265-265
275-275
: LGTM! Consistent return structure.The return structure aligns with the changes in other files, maintaining consistency across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (5)
1-16
: Add environment variable validation.The code uses environment variables (
WREN_AI_SERVICE_BASE_URL
,LANGFUSE_HOST
) without validation. Missing environment variables could cause runtime errors.Consider adding environment variable validation at startup:
def validate_env_vars(): required_vars = ['WREN_AI_SERVICE_BASE_URL', 'LANGFUSE_HOST'] missing_vars = [var for var in required_vars if not os.getenv(var)] if missing_vars: raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}")
97-123
: Add input validation in event handlers.The event handlers directly update session state without validating or sanitizing input values.
Consider adding validation:
def on_change_project_id(): + project_id = st.session_state["project_id_input"].strip() + if not project_id.isalnum(): + st.error("Project ID must contain only alphanumeric characters") + return st.session_state["project_id"] = st.session_state["project_id_input"]
374-380
: Make pagination configuration configurable.The load more functionality uses a hardcoded value of 10 items.
Consider making this configurable:
+# At the top of the file +DEFAULT_PAGE_SIZE = 10 + async def main(): if "load_num" not in st.session_state: - st.session_state["load_num"] = 10 + st.session_state["load_num"] = DEFAULT_PAGE_SIZE - num = 10 + num = DEFAULT_PAGE_SIZE st.button( f"Load {num} more chart data",
277-288
: Move language configuration to a separate module.The available languages list is hardcoded in the main UI code.
Consider moving this to a configuration module:
# config.py AVAILABLE_LANGUAGES = [ "English", "Traditional Chinese", "Simplified Chinese", # ... rest of the languages ] # Import in main.py from config import AVAILABLE_LANGUAGES
191-194
: Add docstring to document the main function.The main function lacks documentation about its purpose and functionality.
Add a docstring:
async def main(): + """ + Initialize and render the Streamlit chart evaluation application. + + This function sets up the page configuration, initializes session state variables, + and renders the UI components for chart evaluation and filtering. + """ st.set_page_config(layout="wide") st.title("Chart Evaluation")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1)
37-95
: 🛠️ Refactor suggestionImprove error handling and data validation.
The function has several issues:
- Broad exception handling without logging makes debugging difficult
- Missing validation for required fields in the input data
Consider adding proper error handling and validation:
def get_chart_data( chart_traces: List[TraceWithDetails], chart_spans: List[ObservationsView], chart_observations: List[ObservationsView], release: str = "", project_id: str = "", chart_types: set[str] = set(), llms: set[str] = set(), skip_empty_chart: bool = False, only_empty_chart: bool = False, ) -> List[Dict[str, Any]]: + if not all([chart_traces, chart_spans, chart_observations]): + raise ValueError("Missing required input data") + chart_data = [] tz = pytz.timezone("Asia/Taipei") for chart_trace, chart_span, chart_observation in zip( *match_traces_spans_observations(chart_traces, chart_observations, chart_spans) ): try: chart_output = orjson.loads(chart_observation.output["replies"][0]) chart_trace_input = orjson.loads(chart_trace.input) # ... rest of the processing ... - except Exception: + except Exception as e: + st.warning(f"Failed to process chart data: {str(e)}") continueLikely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1)
19-34
: 🛠️ Refactor suggestionImprove matching efficiency with a hash-based approach.
The current implementation uses a linear search which could be inefficient for large datasets. Consider using a dictionary for O(1) lookups.
def match_traces_spans_observations( traces: List[TraceWithDetails], observations: List[ObservationsView], spans: List[TraceWithDetails], ) -> Tuple[List[TraceWithDetails], List[TraceWithDetails], List[ObservationsView]]: filtered_traces = [] filtered_spans = [] - _id = 0 + span_map = {span.id: (i, span) for i, span in enumerate(spans)} + trace_map = {i: trace for i, trace in enumerate(traces)} for observation in observations: - while observation.parent_observation_id != spans[_id].id: - _id += 1 - filtered_spans.append(spans[_id]) - filtered_traces.append(traces[_id]) - _id += 1 + if observation.parent_observation_id in span_map: + idx, span = span_map[observation.parent_observation_id] + filtered_spans.append(span) + filtered_traces.append(trace_map[idx]) return filtered_traces, filtered_spans, observations
🧹 Nitpick comments (2)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (2)
173-192
: Add retry mechanism for API calls.Consider implementing a retry mechanism with exponential backoff for the API calls to handle temporary network issues or service unavailability.
+from tenacity import retry, stop_after_attempt, wait_exponential + +@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10)) async def get_chart_traces_spans_and_observations(): client = get_langfuse_client() chart_traces, chart_spans, chart_observations = await asyncio.gather(
272-277
: Externalize configuration values.Consider moving hardcoded LLM options and language choices to a configuration file for easier maintenance and updates.
+from typing import List +import yaml + +def load_config() -> dict: + with open("config.yaml") as f: + return yaml.safe_load(f) + +config = load_config() +available_languages: List[str] = config["languages"] +llm_options: List[str] = config["llm_options"]Also applies to: 283-294
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (2)
97-123
: LGTM! Well-structured state management.The state management functions are concise, focused, and follow a consistent pattern.
158-162
: Consider potential race conditions in session state updates.Multiple concurrent chart generations could lead to race conditions when updating
rerun_chart_data_results
. Consider using a lock mechanism or ensuring sequential updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (3)
37-95
: Make timezone configurable.The timezone is hardcoded to "Asia/Taipei". Consider making it configurable through environment variables or application settings.
+def get_timezone() -> str: + return os.getenv("CHART_EVAL_TIMEZONE", "UTC") def get_chart_data( chart_traces: List[TraceWithDetails], chart_spans: List[ObservationsView], chart_observations: List[ObservationsView], release: str = "", project_id: str = "", chart_types: set[str] = set(), llms: set[str] = set(), skip_empty_chart: bool = False, only_empty_chart: bool = False, ) -> List[Dict[str, Any]]: chart_data = [] - tz = pytz.timezone("Asia/Taipei") + tz = pytz.timezone(get_timezone())
224-224
: Move LOAD_NUM to top-level constants.Consider moving the
LOAD_NUM
constant to the top of the file with other configuration constants for better maintainability.+# Configuration constants +LOAD_NUM = 10 + async def main(): st.set_page_config(layout="wide") st.title("Chart Evaluation") - LOAD_NUM = 10
272-277
: Consider loading LLM options dynamically.The LLM options are hardcoded and might need frequent updates. Consider loading them from a configuration file or environment variables.
+def get_available_llms() -> List[str]: + """Load available LLM options from configuration.""" + return os.getenv("AVAILABLE_LLMS", "gpt-4o,gpt-4o-2024-08-06").split(",") st.multiselect( "Select LLMs", - options=[ - "gpt-4o", - "gpt-4o-2024-08-06", - "gpt-4o-mini", - "gpt-4o-mini-2024-07-18", - ], + options=get_available_llms(), key="llms_input", default=st.session_state["llms"], on_change=on_change_llms, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1)
19-34
:⚠️ Potential issueAdd bounds checking to prevent IndexError.
The while loop could potentially exceed the bounds of the
spans
list if a matchingparent_observation_id
is not found.def match_traces_spans_observations( traces: List[TraceWithDetails], observations: List[ObservationsView], spans: List[TraceWithDetails], ) -> Tuple[List[TraceWithDetails], List[TraceWithDetails], List[ObservationsView]]: filtered_traces = [] filtered_spans = [] _id = 0 for observation in observations: - while observation.parent_observation_id != spans[_id].id: + while _id < len(spans) and observation.parent_observation_id != spans[_id].id: _id += 1 + if _id >= len(spans): + continue filtered_spans.append(spans[_id]) filtered_traces.append(traces[_id]) _id += 1Likely invalid or redundant comment.
797fa86
to
6fb776e
Compare
Summary by CodeRabbit
New Features
ChartRequest
for custom data handling.Refactor
ChartDataPreprocessor
class.chart
method.ChartGenerationPostProcessor
class.Chores