Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(wren-ai-service): chart eval #1192

Closed
wants to merge 14 commits into from
Closed

Conversation

cyyeh
Copy link
Member

@cyyeh cyyeh commented Jan 20, 2025

Summary by CodeRabbit

  • New Features

    • Added a Streamlit application for chart evaluation.
    • Enhanced chart generation pipeline with more flexible data handling.
    • Introduced new utility functions for trace and observation management.
    • Added optional attributes to ChartRequest for custom data handling.
  • Refactor

    • Restructured trace and observation retrieval logic.
    • Simplified data preprocessing in the chart generation pipeline.
    • Updated report generation command syntax in Justfile.
    • Streamlined data handling in the ChartDataPreprocessor class.
    • Improved the logic for handling SQL data in the chart method.
    • Updated the parameter handling in the ChartGenerationPostProcessor class.
  • Chores

    • Updated build and evaluation scripts in Justfile.
    • Improved code organization in evaluation modules.

@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Jan 20, 2025
@cyyeh cyyeh requested a review from paopa January 20, 2025 00:29
Copy link
Contributor

coderabbitai bot commented Jan 20, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces enhancements to the chart generation and evaluation workflow in the Wren AI service. Key modifications include updating the Justfile to alter report generation commands and adding a new Streamlit application for chart evaluation. The refactoring of trace and observation handling is evident through the introduction of a utility module. Additionally, changes in the chart generation and adjustment pipelines improve data handling and flexibility, while new attributes in the ChartRequest class allow for custom data inclusion.

Changes

File Change Summary
wren-ai-service/Justfile - Updated report-gen target to use module syntax
- Added new chart-eval target to run Streamlit chart evaluation app
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py - New Streamlit application for chart evaluation
- Added multiple functions for data matching, retrieval, and user interaction
wren-ai-service/eval/llm_trace_analysis/report_gen.py - Removed Langfuse API client functions
- Imported trace-related functions from new utils.py module
wren-ai-service/eval/llm_trace_analysis/utils.py - New utility module for Langfuse API interactions
- Added functions for initializing client, fetching traces and observations
wren-ai-service/src/pipelines/generation/chart_adjustment.py - Modified data handling in preprocess_data and prompt functions
wren-ai-service/src/pipelines/generation/chart_generation.py - Added remove_data_from_chart_schema parameter
- Enhanced flexibility in chart generation process
wren-ai-service/src/pipelines/generation/utils/chart.py - Updated ChartDataPreprocessor output structure
- Simplified data extraction logic
wren-ai-service/src/web/v1/services/chart.py

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e741a28 and 6fb776e.

📒 Files selected for processing (8)
  • wren-ai-service/Justfile (1 hunks)
  • wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1 hunks)
  • wren-ai-service/eval/llm_trace_analysis/report_gen.py (1 hunks)
  • wren-ai-service/eval/llm_trace_analysis/utils.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_adjustment.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (8 hunks)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (2 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (3 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (8)
wren-ai-service/src/pipelines/generation/chart_generation.py (1)

121-121: Consider adding type hints for the return value.

The preprocess_data function could benefit from more specific type hints for its return value.

-    return chart_data_preprocessor.run(data)["sample_data"]
+    sample_data: Dict[str, Any] = chart_data_preprocessor.run(data)["sample_data"]
+    return sample_data
wren-ai-service/eval/llm_trace_analysis/utils.py (5)

15-15: Use find_dotenv to dynamically locate the .env file

The hardcoded path in load_dotenv("eval/llm_trace_analysis/.env", override=True) may fail if the script is run from a different working directory. To ensure the .env file is found regardless of the execution context, consider using find_dotenv().

Apply this diff to use find_dotenv:

-import os
-from dotenv import load_dotenv
+import os
+from dotenv import load_dotenv, find_dotenv

-load_dotenv("eval/llm_trace_analysis/.env", override=True)
+load_dotenv(dotenv_path=find_dotenv(), override=True)

49-88: Limit concurrent API calls to avoid rate limiting

In get_all_traces and get_all_observations, using asyncio.gather(*tasks) to fetch all pages concurrently can lead to a large number of simultaneous API requests if there are many pages. This might overwhelm the API server or exceed rate limits.

Consider limiting the number of concurrent requests using asyncio.Semaphore:

 async def get_all_traces(
     client: AsyncFernLangfuse,
     name: Optional[str] = None,
     from_timestamp: Optional[datetime] = None,
     to_timestamp: Optional[datetime] = None,
     release: Optional[str] = None,
+    max_concurrency: int = 5,
 ) -> List[TraceWithDetails]:
     # Get first page to determine total pages
     first_page = await client.trace.list(
         name=name,
         page=1,
         from_timestamp=from_timestamp,
         to_timestamp=to_timestamp,
         release=release,
     )

     # Create semaphore
+    semaphore = asyncio.Semaphore(max_concurrency)

     # Define a wrapper to limit concurrency
+    async def fetch_page(page: int):
+        async with semaphore:
+            return await client.trace.list(
+                name=name,
+                page=page,
+                from_timestamp=from_timestamp,
+                to_timestamp=to_timestamp,
+                release=release,
+            )

     # Create tasks for all remaining pages
     tasks = [
-        client.trace.list(
-            name=name,
-            page=page,
-            from_timestamp=from_timestamp,
-            to_timestamp=to_timestamp,
-            release=release,
-        )
+        fetch_page(page)
         for page in range(2, first_page.meta.total_pages + 1)
     ]

     # Gather all pages with limited concurrency
     all_responses = [first_page]
     if tasks:  # Only gather if there are additional pages
         all_responses.extend(await asyncio.gather(*tasks))

Apply similar changes to get_all_observations.

Also applies to: 96-136


37-40: Provide specific details in error messages for missing environment variables

The current error message lists all possible missing variables, even if only some are missing. To aid debugging, specify exactly which environment variables are not set.

Apply this diff to improve the error message:

 if not public_key or not secret_key or not host:
+    missing_vars = []
+    if not public_key:
+        missing_vars.append("LANGFUSE_PUBLIC_KEY")
+    if not secret_key:
+        missing_vars.append("LANGFUSE_SECRET_KEY")
+    if not host:
+        missing_vars.append("LANGFUSE_HOST")
     raise ValueError(
-        "Missing required Langfuse environment variables: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST"
+        f"Missing required Langfuse environment variables: {', '.join(missing_vars)}"
     )

85-85: Simplify data aggregation by using response data directly

There's no need to re-instantiate FetchTracesResponse or FetchObservationsResponse when you already have the response objects. This can simplify the code and improve readability.

Apply this diff to streamline the aggregation:

 for response in all_responses:
-    traces.extend(FetchTracesResponse(data=response.data, meta=response.meta).data)
+    traces.extend(response.data)
 for response in all_responses:
-    observations.extend(
-        FetchObservationsResponse(data=response.data, meta=response.meta).data
-    )
+    observations.extend(response.data)

Also applies to: 132-134


78-81: Handle cases when there is only one page of results

The current logic correctly skips asyncio.gather when there are no additional pages. However, explicitly handling when total_pages is 1 can improve code clarity.

Consider adding a comment or adjusting the logic for readability:

# Gather all pages concurrently if there is more than one page
if first_page.meta.total_pages > 1:
    tasks = [
        # tasks definition...
    ]
    # Gather additional pages
    all_responses.extend(await asyncio.gather(*tasks))
# Else, proceed with only the first page

Also applies to: 125-128

wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (2)

264-275: Fetch LLM options dynamically.

Hardcoding LLM versions in the UI will require frequent updates. Consider fetching these options dynamically from a configuration service or environment variables.

Consider this approach:

+def get_available_llms():
+    try:
+        response = requests.get(
+            f"{os.getenv('WREN_AI_SERVICE_BASE_URL')}/v1/llms",
+            timeout=30,
+        )
+        response.raise_for_status()
+        return response.json()["models"]
+    except Exception as e:
+        st.error(f"Failed to fetch LLM options: {e}")
+        return ["gpt-4o"]  # Fallback to default

 st.multiselect(
     "Select LLMs",
-    options=[
-        "gpt-4o",
-        "gpt-4o-2024-08-06",
-        "gpt-4o-mini",
-        "gpt-4o-mini-2024-07-18",
-    ],
+    options=get_available_llms(),
     key="llms_input",
     default=st.session_state["llms"],
     on_change=on_change_llms,
 )

191-219: Improve state management structure and type safety.

The current state initialization is scattered and lacks type safety. Consider using a more structured approach with type hints.

Consider this improvement:

+from dataclasses import dataclass
+from typing import TypedDict, Optional, Set
+
+@dataclass
+class ChartState:
+    chart_traces: List[TraceWithDetails]
+    chart_spans: List[ObservationsView]
+    chart_observations: List[ObservationsView]
+    load_num: int = 10
+    project_id: str = ""
+    chart_types: Set[str] = set()
+    llms: Set[str] = set()
+    language: str = "English"
+    skip_empty_chart: bool = False
+    only_empty_chart: bool = False
+    release: str = ""
+    rerun_chart_data_results: dict = field(default_factory=dict)
+
+def initialize_state() -> None:
+    if "state" not in st.session_state:
+        st.session_state.state = ChartState(
+            chart_traces=[],
+            chart_spans=[],
+            chart_observations=[],
+        )

 async def main():
     st.set_page_config(layout="wide")
     st.title("Chart Evaluation")
 
-    if "chart_traces" not in st.session_state:
-        st.session_state["chart_traces"] = []
-    # ... more state initialization ...
+    initialize_state()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c99825f and db81322.

📒 Files selected for processing (8)
  • wren-ai-service/Justfile (1 hunks)
  • wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1 hunks)
  • wren-ai-service/eval/llm_trace_analysis/report_gen.py (1 hunks)
  • wren-ai-service/eval/llm_trace_analysis/utils.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_adjustment.py (2 hunks)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (8 hunks)
  • wren-ai-service/src/pipelines/generation/utils/chart.py (2 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: pytest
  • GitHub Check: Analyze (python)
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (13)
wren-ai-service/Justfile (2)

71-71: LGTM! Good practice using module syntax.

The change to use module syntax (python -m) instead of direct script execution is a good practice as it ensures proper module resolution and maintains consistency with Python's import system.


73-74: LGTM! Well-structured Streamlit target.

The new chart-eval target follows the established pattern for Streamlit applications in this Justfile (consistent with curate_eval_data and demo targets).

wren-ai-service/src/web/v1/services/chart.py (3)

22-23: LGTM! New attributes enhance flexibility.

The new attributes provide good flexibility:

  • data allows direct data injection without SQL execution
  • remove_data_from_chart_schema controls data retention in schema

136-136: LGTM! Consistent parameter passing.

The parameters are correctly passed to the chart_generation pipeline, maintaining consistency with the new attributes.

Also applies to: 138-138


119-129: Verify error handling for invalid data format.

The conditional data handling looks good, but we should ensure proper error handling for invalid data formats.

Run this script to check for error handling in similar implementations:

✅ Verification successful

Input validation and error handling are properly implemented

The code has comprehensive error handling through:

  • Pydantic models for automatic data validation
  • Status tracking ("fetching" -> "generating" -> "failed")
  • Error metadata capture in the results dictionary
  • Proper handling of both direct data and SQL-fetched data paths
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for error handling patterns in data processing
rg -A 5 "try:|except:|raise" "wren-ai-service/src/web/v1/services/"

Length of output: 9262


Script:

#!/bin/bash
# Get complete error handling context from chart.py
rg -A 15 -B 5 "try:|except:|raise" "wren-ai-service/src/web/v1/services/chart.py"

Length of output: 781


Script:

#!/bin/bash
# Get extended context to see complete try-except blocks
rg -A 30 -B 10 "try:|except:|raise" "wren-ai-service/src/web/v1/services/chart.py"

Length of output: 1567


Script:

#!/bin/bash
# Search for ChartResultResponse class and its usage
ast-grep --pattern 'class ChartResultResponse'

# Search for validation related code
rg -B 5 -A 10 "validator|@validate|@dataclass|pydantic" "wren-ai-service/src/web/v1/services/"

Length of output: 19425

wren-ai-service/src/pipelines/generation/chart_generation.py (3)

75-78: LGTM! Clean data handling implementation.

The conditional data removal is well-implemented with a clear boolean check.


152-154: LGTM! Proper parameter propagation.

The parameters are correctly propagated through the pipeline stages.


205-205: LGTM! Consistent default value.

The optional parameter is properly defined with a sensible default value that matches the ChartRequest class.

Also applies to: 215-215

wren-ai-service/src/pipelines/generation/chart_adjustment.py (2)

141-141: LGTM! Simplified data access.

The direct return of sample_data aligns with the changes in chart.py and improves code clarity.


159-159: LGTM! Consistent data handling.

The prompt function correctly uses the preprocessed data, maintaining consistency with other changes.

wren-ai-service/src/pipelines/generation/utils/chart.py (3)

258-258: LGTM! Improved type safety.

The output type is now more specific and better reflects the actual return value.


263-263: LGTM! Simplified data access pattern.

The direct access to columns and data improves code readability and reduces nesting.

Also applies to: 265-265


275-275: LGTM! Consistent return structure.

The return structure aligns with the changes in other files, maintaining consistency across the codebase.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (5)

1-16: Add environment variable validation.

The code uses environment variables (WREN_AI_SERVICE_BASE_URL, LANGFUSE_HOST) without validation. Missing environment variables could cause runtime errors.

Consider adding environment variable validation at startup:

def validate_env_vars():
    required_vars = ['WREN_AI_SERVICE_BASE_URL', 'LANGFUSE_HOST']
    missing_vars = [var for var in required_vars if not os.getenv(var)]
    if missing_vars:
        raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}")

97-123: Add input validation in event handlers.

The event handlers directly update session state without validating or sanitizing input values.

Consider adding validation:

 def on_change_project_id():
+    project_id = st.session_state["project_id_input"].strip()
+    if not project_id.isalnum():
+        st.error("Project ID must contain only alphanumeric characters")
+        return
     st.session_state["project_id"] = st.session_state["project_id_input"]

374-380: Make pagination configuration configurable.

The load more functionality uses a hardcoded value of 10 items.

Consider making this configurable:

+# At the top of the file
+DEFAULT_PAGE_SIZE = 10
+
 async def main():
     if "load_num" not in st.session_state:
-        st.session_state["load_num"] = 10
+        st.session_state["load_num"] = DEFAULT_PAGE_SIZE

-    num = 10
+    num = DEFAULT_PAGE_SIZE
     st.button(
         f"Load {num} more chart data",

277-288: Move language configuration to a separate module.

The available languages list is hardcoded in the main UI code.

Consider moving this to a configuration module:

# config.py
AVAILABLE_LANGUAGES = [
    "English",
    "Traditional Chinese",
    "Simplified Chinese",
    # ... rest of the languages
]

# Import in main.py
from config import AVAILABLE_LANGUAGES

191-194: Add docstring to document the main function.

The main function lacks documentation about its purpose and functionality.

Add a docstring:

 async def main():
+    """
+    Initialize and render the Streamlit chart evaluation application.
+    
+    This function sets up the page configuration, initializes session state variables,
+    and renders the UI components for chart evaluation and filtering.
+    """
     st.set_page_config(layout="wide")
     st.title("Chart Evaluation")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db81322 and a651161.

📒 Files selected for processing (1)
  • wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1)

37-95: 🛠️ Refactor suggestion

Improve error handling and data validation.

The function has several issues:

  1. Broad exception handling without logging makes debugging difficult
  2. Missing validation for required fields in the input data

Consider adding proper error handling and validation:

 def get_chart_data(
     chart_traces: List[TraceWithDetails],
     chart_spans: List[ObservationsView],
     chart_observations: List[ObservationsView],
     release: str = "",
     project_id: str = "",
     chart_types: set[str] = set(),
     llms: set[str] = set(),
     skip_empty_chart: bool = False,
     only_empty_chart: bool = False,
 ) -> List[Dict[str, Any]]:
+    if not all([chart_traces, chart_spans, chart_observations]):
+        raise ValueError("Missing required input data")
+    
     chart_data = []
     tz = pytz.timezone("Asia/Taipei")

     for chart_trace, chart_span, chart_observation in zip(
         *match_traces_spans_observations(chart_traces, chart_observations, chart_spans)
     ):
         try:
             chart_output = orjson.loads(chart_observation.output["replies"][0])
             chart_trace_input = orjson.loads(chart_trace.input)
             # ... rest of the processing ...
-        except Exception:
+        except Exception as e:
+            st.warning(f"Failed to process chart data: {str(e)}")
             continue

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1)

19-34: 🛠️ Refactor suggestion

Improve matching efficiency with a hash-based approach.

The current implementation uses a linear search which could be inefficient for large datasets. Consider using a dictionary for O(1) lookups.

 def match_traces_spans_observations(
     traces: List[TraceWithDetails],
     observations: List[ObservationsView],
     spans: List[TraceWithDetails],
 ) -> Tuple[List[TraceWithDetails], List[TraceWithDetails], List[ObservationsView]]:
     filtered_traces = []
     filtered_spans = []
-    _id = 0
+    span_map = {span.id: (i, span) for i, span in enumerate(spans)}
+    trace_map = {i: trace for i, trace in enumerate(traces)}
     
     for observation in observations:
-        while observation.parent_observation_id != spans[_id].id:
-            _id += 1
-        filtered_spans.append(spans[_id])
-        filtered_traces.append(traces[_id])
-        _id += 1
+        if observation.parent_observation_id in span_map:
+            idx, span = span_map[observation.parent_observation_id]
+            filtered_spans.append(span)
+            filtered_traces.append(trace_map[idx])
 
     return filtered_traces, filtered_spans, observations
🧹 Nitpick comments (2)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (2)

173-192: Add retry mechanism for API calls.

Consider implementing a retry mechanism with exponential backoff for the API calls to handle temporary network issues or service unavailability.

+from tenacity import retry, stop_after_attempt, wait_exponential
+
+@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
 async def get_chart_traces_spans_and_observations():
     client = get_langfuse_client()
     chart_traces, chart_spans, chart_observations = await asyncio.gather(

272-277: Externalize configuration values.

Consider moving hardcoded LLM options and language choices to a configuration file for easier maintenance and updates.

+from typing import List
+import yaml
+
+def load_config() -> dict:
+    with open("config.yaml") as f:
+        return yaml.safe_load(f)
+
+config = load_config()
+available_languages: List[str] = config["languages"]
+llm_options: List[str] = config["llm_options"]

Also applies to: 283-294

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a651161 and 0380ced.

📒 Files selected for processing (1)
  • wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (2)

97-123: LGTM! Well-structured state management.

The state management functions are concise, focused, and follow a consistent pattern.


158-162: Consider potential race conditions in session state updates.

Multiple concurrent chart generations could lead to race conditions when updating rerun_chart_data_results. Consider using a lock mechanism or ensuring sequential updates.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (3)

37-95: Make timezone configurable.

The timezone is hardcoded to "Asia/Taipei". Consider making it configurable through environment variables or application settings.

+def get_timezone() -> str:
+    return os.getenv("CHART_EVAL_TIMEZONE", "UTC")

 def get_chart_data(
     chart_traces: List[TraceWithDetails],
     chart_spans: List[ObservationsView],
     chart_observations: List[ObservationsView],
     release: str = "",
     project_id: str = "",
     chart_types: set[str] = set(),
     llms: set[str] = set(),
     skip_empty_chart: bool = False,
     only_empty_chart: bool = False,
 ) -> List[Dict[str, Any]]:
     chart_data = []
-    tz = pytz.timezone("Asia/Taipei")
+    tz = pytz.timezone(get_timezone())

224-224: Move LOAD_NUM to top-level constants.

Consider moving the LOAD_NUM constant to the top of the file with other configuration constants for better maintainability.

+# Configuration constants
+LOAD_NUM = 10
+
 async def main():
     st.set_page_config(layout="wide")
     st.title("Chart Evaluation")
-    LOAD_NUM = 10

272-277: Consider loading LLM options dynamically.

The LLM options are hardcoded and might need frequent updates. Consider loading them from a configuration file or environment variables.

+def get_available_llms() -> List[str]:
+    """Load available LLM options from configuration."""
+    return os.getenv("AVAILABLE_LLMS", "gpt-4o,gpt-4o-2024-08-06").split(",")

 st.multiselect(
     "Select LLMs",
-    options=[
-        "gpt-4o",
-        "gpt-4o-2024-08-06",
-        "gpt-4o-mini",
-        "gpt-4o-mini-2024-07-18",
-    ],
+    options=get_available_llms(),
     key="llms_input",
     default=st.session_state["llms"],
     on_change=on_change_llms,
 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0380ced and e741a28.

📒 Files selected for processing (1)
  • wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
wren-ai-service/eval/llm_trace_analysis/chart_eval_app.py (1)

19-34: ⚠️ Potential issue

Add bounds checking to prevent IndexError.

The while loop could potentially exceed the bounds of the spans list if a matching parent_observation_id is not found.

 def match_traces_spans_observations(
     traces: List[TraceWithDetails],
     observations: List[ObservationsView],
     spans: List[TraceWithDetails],
 ) -> Tuple[List[TraceWithDetails], List[TraceWithDetails], List[ObservationsView]]:
     filtered_traces = []
     filtered_spans = []
     _id = 0
     for observation in observations:
-        while observation.parent_observation_id != spans[_id].id:
+        while _id < len(spans) and observation.parent_observation_id != spans[_id].id:
             _id += 1
+        if _id >= len(spans):
+            continue
         filtered_spans.append(spans[_id])
         filtered_traces.append(traces[_id])
         _id += 1

Likely invalid or redundant comment.

@cyyeh cyyeh force-pushed the chore/ai-service/chart-eval branch from 797fa86 to 6fb776e Compare January 21, 2025 00:57
@cyyeh cyyeh closed this Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/ai-service ai-service related module/ai-service ai-service related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant