Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-8479][CORE][Part-3] Split backend configs to its corresponding modules #8586

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yikf
Copy link
Contributor

@yikf yikf commented Jan 21, 2025

What changes were proposed in this pull request?

Fix #8479, This PR aims to split the backend configurations into individual modules that belong to each other.

This PR mainly accomplishes two things:

  1. It splits the configurations defined in GlutenConfig, moving Velox-related configurations to VeloxConfig and CH-related configurations to CHConf.
  2. It deletes the configuration retrieval methods that are not in use.

How was this patch tested?

GA.

@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Jan 21, 2025
Copy link

#8479

@yikf
Copy link
Contributor Author

yikf commented Jan 21, 2025

@jackylee-ch @baibaichen Would you please take a look, thanks!

Copy link

Run Gluten Clickhouse CI on x86

@jackylee-ch
Copy link
Contributor

It seems that the current PR is about splitting config with backend rather than module. For this pr, .backend() should handle this case?

@yikf
Copy link
Contributor Author

yikf commented Jan 22, 2025

@jackylee-ch Move the configuration of a specific backend to its own module. The backendType is used to identify the configuration type and has nothing to do with where the configuration is located. It's not certain whether dividing into multiple configuration classes makes sense.

@jackylee-ch
Copy link
Contributor

@jackylee-ch Move the configuration of a specific backend to its own module. The backendType is used to identify the configuration type and has nothing to do with where the configuration is located. It's not certain whether dividing into multiple configuration classes makes sense.

Prefer putting the configs in GlutenConfigs and using backend() or module() to seperate the configs, it is not easy to maintain these configurations separately. cc @zhztheplayer @PHILO-HE

@zhztheplayer
Copy link
Member

I am inclined to placing them in separated modules. But also would like to hear more views here.

@yikf
Copy link
Contributor Author

yikf commented Jan 22, 2025

@jackylee-ch @zhztheplayer If separated, the final state seems quite good. The backend-related configurations will only be used by something like VeloxBackendApi, making it clearer. I tend to separate and see if anyone else has any ideas.

Copy link

Run Gluten Clickhouse CI on x86

@jackylee-ch
Copy link
Contributor

BTW, Is our goal to split the configuration of all modules or just the backends? I only see backend config splitting in this PR.

@yikf
Copy link
Contributor Author

yikf commented Jan 22, 2025

BTW, Is our goal to split the configuration of all modules or just the backends? I only see backend config splitting in this PR.

Sorry for the confusion caused to you. the purpose of this PR is to split the backend configuration to its corresponding module. I have also modified the PR title and description.

@yikf yikf changed the title [GLUTEN-8479][CORE][Part-3] Split config to multiple modules [GLUTEN-8479][CORE][Part-3] Split backend configs to its corresponding modules Jan 22, 2025
Copy link

Run Gluten Clickhouse CI on x86

import org.apache.gluten.execution.ColumnarNativeIterator
import org.apache.gluten.memory.CHThreadGroup
import org.apache.gluten.vectorized._

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should not be removed

import org.apache.spark._
import org.apache.spark.scheduler.MapStatus
import org.apache.spark.shuffle.celeborn.CelebornShuffleHandle
import org.apache.spark.sql.vectorized.ColumnarBatch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link

Run Gluten Clickhouse CI on x86

@@ -80,10 +79,10 @@ class CHCelebornColumnarShuffleWriter[K, V](
nativeBufferSize,
capitalizedCompressionCodec,
compressionLevel,
GlutenConfig.get.chColumnarShuffleSpillThreshold,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the use of GlutenConfig.get after the refactor? I prefer a uniform API. It requires either velox conf or CH conf to be covered by GlutenConfig at runtime somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that is to say, the definition of the configuration is independent, but all the acquisitions are from GlutenConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can make VeloxConfig inherit from GlutenConfig? Then VeloxConfig can be used uniformly in Velox backend for both Velox / Common configurations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is an inheritance relationship.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, then I feel it's already convenient to developers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that is to say, the definition of the configuration is independent, but all the acquisitions are from GlutenConfig?

@yikf, yes, it's not a mature idea. Maybe, firstly keep using your current changes.

@PHILO-HE
Copy link
Contributor

@baibaichen, please take a look.

@jackylee-ch
Copy link
Contributor

@yikf There are some code error for ch backend, could you fix it?

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLICKHOUSE CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CORE] Split the configuration into individual modules that belong to each other.
4 participants