Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(wren-ai-service): improve chart generation #1101

Merged
merged 4 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions wren-ai-service/src/pipelines/generation/chart_adjustment.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,20 @@
chart_adjustment_system_prompt = f"""
### TASK ###

You are a data analyst great at visualizing data using vega-lite! Given the data using the 'columns' formatted JSON from pandas.DataFrame APIs,
original question, SQL query, vega-lite schema and the adjustment options, you need to regenerate vega-lite schema in JSON and provide suitable chart type according to the adjustment options;
Besides, you need to give a concise and easy-to-understand reasoning to describe why you provide such vega-lite schema.
You are a data analyst great at visualizing data using vega-lite! Given the user's question, SQL, data, vega-lite schema and adjustment options,
you need to re-generate vega-lite schema in JSON and provide suitable chart type.
Besides, you need to give a concise and easy-to-understand reasoning within 20 words to describe why you provide such vega-lite schema.

{chart_generation_instructions}
- If you think the adjustment options are not suitable for the data, you can return an empty string for the schema and chart type and give reasoning to explain why.

### OUTPUT FORMAT ###

Please provide your chain of thought reasoning and the vega-lite schema in JSON format.
Please provide your chain of thought reasoning, chart type and the vega-lite schema in JSON format.

{{
"reasoning": <REASON_TO_CHOOSE_THE_SCHEMA_IN_STRING_FORMATTED_IN_LANGUAGE_PROVIDED_BY_USER>,
"chart_type": "line" | "bar" | "pie" | "grouped_bar" | "stacked_bar" | "area" | "",
"chart_type": "line" | "multi_line" | "bar" | "pie" | "grouped_bar" | "stacked_bar" | "area" | "",
"chart_schema": <VEGA_LITE_JSON_SCHEMA>
}}
"""
Expand Down
9 changes: 4 additions & 5 deletions wren-ai-service/src/pipelines/generation/chart_generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,18 @@
chart_generation_system_prompt = f"""
### TASK ###

You are a data analyst great at visualizing data using vega-lite! Given the data using the 'columns' formatted JSON from pandas.DataFrame APIs,
you need to generate vega-lite schema in JSON and provide suitable chart type; we will also give you the question and sql to understand the data.
Besides, you need to give a concise and easy-to-understand reasoning to describe why you provide such vega-lite schema and a within 20 words description of the chart.
You are a data analyst great at visualizing data using vega-lite! Given the user's question, SQL and data, you need to generate vega-lite schema in JSON and provide suitable chart type.
Besides, you need to give a concise and easy-to-understand reasoning within 20 words to describe why you provide such vega-lite schema.

{chart_generation_instructions}

### OUTPUT FORMAT ###

Please provide your chain of thought reasoning, the vega-lite schema in JSON format and the chart type.
Please provide your chain of thought reasoning, chart type and the vega-lite schema in JSON format.

{{
"reasoning": <REASON_TO_CHOOSE_THE_SCHEMA_IN_STRING_FORMATTED_IN_LANGUAGE_PROVIDED_BY_USER>,
"chart_type": "line" | "bar" | "pie" | "grouped_bar" | "stacked_bar" | "area" | "",
"chart_type": "line" | "multi_line" | "bar" | "pie" | "grouped_bar" | "stacked_bar" | "area" | "",
Comment on lines +34 to +38
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing MultiLineChartSchema implementation for the new "multi_line" chart type

The "multi_line" option has been added to ChartGenerationResults.chart_type, but the corresponding schema implementation is missing:

  • No MultiLineChartSchema class exists in the codebase
  • "multi_line" is not included in ChartSchema.ChartType's supported types
  • Other chart types (line, bar, pie, etc.) all have their corresponding schema implementations
🔗 Analysis chain

Verify schema updates and consider documenting chart types.

The addition of "multi_line" chart type aligns with the PR objectives. Let's verify the schema updates and consider documenting the chart types.

Let's verify that the new chart type is properly supported in related files:

Consider adding docstrings or comments to document:

  1. The purpose and use case for each chart type
  2. Required data format for each chart type
  3. Example schema for the new "multi_line" chart type
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if ChartGenerationResults and related files are updated
echo "Checking for ChartGenerationResults definition..."
rg -A 10 "class ChartGenerationResults" src/

echo "Checking for other occurrences of chart types..."
rg "chart_type.*=.*(?:line|multi_line|bar|pie|grouped_bar|stacked_bar|area)" src/

echo "Checking for any potential missed chart type validations..."
rg "chart_type.*:.*(?:str|Literal)" src/

Length of output: 653


Script:

#!/bin/bash
# Check schema and chart type validations with correct path
echo "Checking for ChartGenerationResults definition..."
rg -A 10 "class ChartGenerationResults" wren-ai-service/src/

echo "Checking for other occurrences of chart types..."
rg "chart_type.*=.*(?:line|multi_line|bar|pie|grouped_bar|stacked_bar|area)" wren-ai-service/src/

echo "Checking for chart type validations..."
rg "chart_type.*:.*(?:str|Literal)" wren-ai-service/src/

# Additional check for any schema or type definitions
echo "Checking for schema definitions..."
rg -A 5 "schema.*=.*{" wren-ai-service/src/

Length of output: 2528


Script:

#!/bin/bash
# Check for MultiLineChartSchema definition and usage
echo "Checking for MultiLineChartSchema..."
rg -A 5 "MultiLineChartSchema" wren-ai-service/src/

echo "Checking for chart schema types..."
rg -A 5 "class.*ChartSchema" wren-ai-service/src/

Length of output: 4958

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyyeh Thanks for fixing the issue! I can confirm that the MultiLineChartSchema has been properly implemented with all necessary components. Great work!

🔗 Analysis chain
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for MultiLineChartSchema implementation
echo "=== Checking for MultiLineChartSchema class ==="
rg -A 10 "class MultiLineChartSchema" 

echo -e "\n=== Checking for ChartType enum/literal with multi_line ==="
rg "ChartType.*=.*multi_line"

echo -e "\n=== Checking for MultiLineChartSchema usage ==="
rg "MultiLineChartSchema"

Length of output: 1642


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

"chart_schema": <VEGA_LITE_JSON_SCHEMA>
}}
"""
Expand Down
39 changes: 37 additions & 2 deletions wren-ai-service/src/pipelines/generation/utils/chart.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
### INSTRUCTIONS ###

- Please generate vega-lite schema using v5 version, which is https://vega.github.io/schema/vega-lite/v5.json
- Chart types: Bar chart, Line chart, Area chart, Pie chart, Stacked bar chart, Grouped bar chart
- Chart types: Bar chart, Line chart, Multi line chart, Area chart, Pie chart, Stacked bar chart, Grouped bar chart
- You can only use the chart types provided in the instructions
- If the sample data is not suitable for visualization, you must return an empty string for the schema and chart type
- If the sample data is empty, you must return an empty string for the schema and chart type
Expand All @@ -30,6 +30,7 @@
- For daily question, the time unit should be "yearmonthdate".
- Default time unit is "yearmonth".
- For each axis, generate the corresponding human-readable title based on the language provided by the user.
- Make sure all of the fields(x, y, xOffset, color, etc.) in the encoding section of the Vega-Lite schema are present in the column names of the data.

### GUIDELINES TO PLOT CHART ###

Expand Down Expand Up @@ -57,6 +58,12 @@
- One temporal or ordinal variable (x-axis).
- One quantitative variable (y-axis).
- Example: Tracking monthly revenue over a year.
- Multi Line Chart
- Use When: Displaying trends over continuous data, especially time.
- Data Requirements:
- One temporal or ordinal variable (x-axis).
- Two or more quantitative variables (y-axis and color).
- Example: Tracking monthly click rate and read rate over a year.
- Area Chart
- Use When: Similar to line charts but emphasizing the volume of change over time.
- Data Requirements:
Expand Down Expand Up @@ -212,6 +219,32 @@
"color": {"field": "Product", "type": "nominal", "title": "<TITLE_IN_LANGUAGE_PROVIDED_BY_USER>"}
}
}
7. Multi Line Chart
- Vega-Lite Spec:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>,
"data": {
"values": [
{"Date": "2022-01-01", "readCount": 100, "clickCount": "10"},
{"Date": "2022-01-02", "readCount": 200, "clickCount": "30"},
{"Date": "2022-01-03", "readCount": 300, "clickCount": "20"},
{"Date": "2022-01-04", "readCount": 400, "clickCount": "40"}
]
},
"mark": {"type": "line"},
"transform": [
{
"fold": ["readCount", "clickCount"],
"as": ["Metric", "Value"]
}
],
"encoding": {
"x": {"field": "Date", "type": "temporal", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>},
"y": {"field": "Value", "type": "quantitative", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>},
"color": {"field": "Metric", "type": "nominal", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>}
}
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Several improvements needed in the multi-line chart example.

  1. The clickCount values are strings instead of numbers, which may cause type issues.
  2. The transform section usage isn't documented in the instructions.

Apply these improvements:

 {
     "data": {
         "values": [
-            {"Date": "2022-01-01", "readCount": 100, "clickCount": "10"},
-            {"Date": "2022-01-02", "readCount": 200, "clickCount": "30"},
-            {"Date": "2022-01-03", "readCount": 300, "clickCount": "20"},
-            {"Date": "2022-01-04", "readCount": 400, "clickCount": "40"}
+            {"Date": "2022-01-01", "readCount": 100, "clickCount": 10},
+            {"Date": "2022-01-02", "readCount": 200, "clickCount": 30},
+            {"Date": "2022-01-03", "readCount": 300, "clickCount": 20},
+            {"Date": "2022-01-04", "readCount": 400, "clickCount": 40}
         ]
     },

Also, add documentation for the transform section in the instructions:

 - Multi Line Chart
     - Use When: Displaying trends over continuous data, especially time.
     - Data Requirements:
         - One temporal or ordinal variable (x-axis).
         - Two or more quantitative variables (y-axis and color).
+    - Implementation Notes:
+        - Uses `transform` with `fold` to combine multiple metrics into a single series
+        - The folded metrics are distinguished using the color encoding
     - Example: Tracking monthly click rate and read rate over a year.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
7. Multi Line Chart
- Vega-Lite Spec:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>,
"data": {
"values": [
{"Date": "2022-01-01", "readCount": 100, "clickCount": "10"},
{"Date": "2022-01-02", "readCount": 200, "clickCount": "30"},
{"Date": "2022-01-03", "readCount": 300, "clickCount": "20"},
{"Date": "2022-01-04", "readCount": 400, "clickCount": "40"}
]
},
"mark": {"type": "line"},
"transform": [
{
"fold": ["readCount", "clickCount"],
"as": ["Metric", "Value"]
}
],
"encoding": {
"x": {"field": "Date", "type": "temporal", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>},
"y": {"field": "Value", "type": "quantitative", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>},
"color": {"field": "Metric", "type": "nominal", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>}
}
}
7. Multi Line Chart
- Use When: Displaying trends over continuous data, especially time.
- Data Requirements:
- One temporal or ordinal variable (x-axis).
- Two or more quantitative variables (y-axis and color).
- Implementation Notes:
- Uses `transform` with `fold` to combine multiple metrics into a single series
- The folded metrics are distinguished using the color encoding
- Example: Tracking monthly click rate and read rate over a year.
- Vega-Lite Spec:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>,
"data": {
"values": [
{"Date": "2022-01-01", "readCount": 100, "clickCount": 10},
{"Date": "2022-01-02", "readCount": 200, "clickCount": 30},
{"Date": "2022-01-03", "readCount": 300, "clickCount": 20},
{"Date": "2022-01-04", "readCount": 400, "clickCount": 40}
]
},
"mark": {"type": "line"},
"transform": [
{
"fold": ["readCount", "clickCount"],
"as": ["Metric", "Value"]
}
],
"encoding": {
"x": {"field": "Date", "type": "temporal", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>},
"y": {"field": "Value", "type": "quantitative", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>},
"color": {"field": "Metric", "type": "nominal", "title": <TITLE_IN_LANGUAGE_PROVIDED_BY_USER>}
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyyeh Thanks for fixing the issues! The multi-line chart example is now more robust with proper numeric types and clearer documentation.


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

"""


Expand Down Expand Up @@ -349,7 +382,9 @@ class AreaChartEncoding(BaseModel):

class ChartGenerationResults(BaseModel):
reasoning: str
chart_type: Literal["line", "bar", "pie", "grouped_bar", "stacked_bar", "area", ""]
chart_type: Literal[
"line", "multi_line", "bar", "pie", "grouped_bar", "stacked_bar", "area", ""
]
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding schema validation for multi-line charts.

While the chart_type has been updated, there's no corresponding schema validation class for multi-line charts like there is for other chart types (LineChartSchema, BarChartSchema, etc.).

Consider adding a MultiLineChartSchema class to validate the specific requirements of multi-line charts, including the transform section:

class MultiLineChartSchema(ChartSchema):
    class MultiLineChartMark(BaseModel):
        type: Literal["line"] = Field(default="line")

    class MultiLineChartTransform(BaseModel):
        fold: list[str]
        as_: list[str] = Field(alias="as")

    class MultiLineChartEncoding(BaseModel):
        x: TemporalChartEncoding | ChartSchema.ChartEncoding
        y: ChartSchema.ChartEncoding
        color: ChartSchema.ChartEncoding

    mark: MultiLineChartMark
    transform: list[MultiLineChartTransform]
    encoding: MultiLineChartEncoding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyyeh Perfect! The schema validation for multi-line charts has been properly implemented.


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

chart_schema: (
LineChartSchema
| BarChartSchema
Expand Down
4 changes: 3 additions & 1 deletion wren-ai-service/src/web/v1/services/chart.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ class ChartResultRequest(BaseModel):

class ChartResult(BaseModel):
reasoning: str
chart_type: Literal["line", "bar", "pie", "grouped_bar", "stacked_bar", "area", ""]
chart_type: Literal[
"line", "bar", "pie", "grouped_bar", "stacked_bar", "area", "multi_line", ""
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously, why did we leave a empty string in types?

chart_schema: dict


Expand Down
8 changes: 6 additions & 2 deletions wren-ai-service/src/web/v1/services/chart_adjustment.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@

# POST /v1/chart-adjustments
class ChartAdjustmentOption(BaseModel):
chart_type: Literal["bar", "grouped_bar", "line", "pie", "stacked_bar", "area"]
chart_type: Literal[
"bar", "grouped_bar", "line", "pie", "stacked_bar", "area", "multi_line"
]
x_axis: Optional[str] = None
y_axis: Optional[str] = None
x_offset: Optional[str] = None
Expand Down Expand Up @@ -75,7 +77,9 @@ class ChartAdjustmentResultRequest(BaseModel):

class ChartAdjustmentResult(BaseModel):
reasoning: str
chart_type: Literal["line", "bar", "pie", "grouped_bar", "stacked_bar", "area", ""]
chart_type: Literal[
"line", "bar", "pie", "grouped_bar", "stacked_bar", "area", "multi_line", ""
]
chart_schema: dict


Expand Down
Loading