Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configurable filter chains #169

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Kuromesi
Copy link

@Kuromesi Kuromesi commented Jan 8, 2025

Motivation

Recently, I have been conducting tests in my environment based on your efforts. I wanted to evaluate the performance of different filter logics, but it is quite time-consuming to modify the filters through source code changes, followed by rebuilding and redeploying the application. During this process, I also noticed two TODOs in the code that align well with my needs:

// TODO(https://github.com/kubernetes-sigs/llm-instance-gateway/issues/16) Make this configurable.
kvCacheThreshold = 0.8

and

// TODO: Compare this strategy with other strategies such as top K.
func leastKVCacheFilterFunc(req *LLMRequest, pods []*backend.PodMetrics) ([]*backend.PodMetrics, error)

As a result, I wrote some code to address these points and would like to know if it meets your approval.

Design

First, I renamed Filter interface to FilterChain and filterFunc to filter since I think this may better describe their functionality (Maybe this is not needed).

Then I added some interfaces.

Filters generation

FilterGen is responsible for generating filters with filter options by calling Get, and filter options should be validated before Get by calling Validate

// FilterGen generate a filter from a filter option
type FilterGen interface {
	Name() string
	Get(*FilterOption) filter
	Validate(*FilterOption) error
}

type FilterOption struct {
	KvCacheThreshold *float64 `json:"kvCacheThreshold,omitempty"`

	QueueThresholdCritical *int `json:"queueThresholdCritical,omitempty"`
	QueueingThresholdLoRA  *int `json:"queueingThresholdLoRA,omitempty"`

	// TopK defines the top k pods to be selected, if not specified, 3 will be used
	TopK *int `json:"topK,omitempty"`
	// TopKBy defines the metric to be used for top k selection,
	// `waiting_queue_size` and `kv_cache_usage_percent` are available
	TopKBy *string `json:"topKBy,omitempty"`
}

And I added a topk filter, though the performance of this filter have not been tested yet.

func filterTopKGetter(opt *FilterOption) filter {
        // defaults to 3
	k := DefaultTopK
	if opt.TopK != nil {
		k = *opt.TopK
	}
	var cmp func(a, b *backend.PodMetrics) bool
	switch *opt.TopKBy {
	case TopKByKVCacheUsagePercent:
		cmp = func(a, b *backend.PodMetrics) bool {
			return a.KVCacheUsagePercent > b.KVCacheUsagePercent
		}
	case TopKByWaitingQueueSize:
		cmp = func(a, b *backend.PodMetrics) bool {
			return a.WaitingQueueSize > b.WaitingQueueSize
		}
	}
	topk := algorithms.NewHeapTopK(cmp)
	return func(req *LLMRequest, pods []*backend.PodMetrics) ([]*backend.PodMetrics, error) {
		return topk.TopK(pods, k), nil
	}
}

Orchestrate filter chains

FilterOrchestrator is responsible to orchestrate a filter chain with FilterOrchestration configuration which is loaded from configmap for now (maybe use a CRD?).

type FilterOrchestrator interface {
	Orchestrate() FilterChain
}

type FilterOrchestration struct {
	Name                   string               `json:"name"`
	NextOnSuccess          *FilterOrchestration `json:"nextOnSuccess,omitempty"`
	NextOnFailure          *FilterOrchestration `json:"nextOnFailure,omitempty"`
	NextOnSuccessOrFailure *FilterOrchestration `json:"nextOnSuccessOrFailure,omitempty"`
	FilterOption           *FilterOption        `json:"filterOption,omitempty"`
}

I also added a reconciler for configmaps and store the configmap in the datastore. It is quite simple and need to be improved.

func (c *FilterConfigReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	if req.NamespacedName.Name != "filter-config" || req.NamespacedName.Namespace != "default" {
		return ctrl.Result{}, nil
	}
	cm := &corev1.ConfigMap{}
	if err := c.Get(ctx, req.NamespacedName, cm); err != nil {
		klog.Errorf("unable to get ConfigMap, err: %v", err)
		return ctrl.Result{}, err
	}
	klog.Infof("updating filter config to: %++v", cm.Data)
	c.Datastore.filterConfigMap = cm.DeepCopy()
	return ctrl.Result{}, nil
}

Configurable filter chains

By doing this, the scheduler can be initialized with a FilterOrchestrator and dynamically configure filters now.

func WithOrchestrator(orchestrator FilterOrchestrator) SchedulerOption {
	return func(s *Scheduler) {
		s.filterOrchestrator = orchestrator
	}
}

type SchedulerOption func(*Scheduler)

type Scheduler struct {
	podMetricsProvider PodMetricsProvider
	filter             FilterChain
	filterOrchestrator FilterOrchestrator
}
pods, err := s.filterOrchestrator.Orchestrate().Filter(req, s.podMetricsProvider.AllPodMetrics())

A configmap demo is shown in pkg/ext-proc/scheduling/orchestrate_test.go which can be orchestrated to a default filter.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Kuromesi
Once this PR has been reviewed and has the lgtm label, please assign smarterclayton for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 8, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @Kuromesi!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 8, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @Kuromesi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 8, 2025
@kfswain
Copy link
Collaborator

kfswain commented Jan 8, 2025

Hey there @Kuromesi! Thanks for the contribution! This looks like a sizable PR, so it may take some time to review and digest. But I am looking at it. 👍🏻

@liu-cong
Copy link
Contributor

liu-cong commented Jan 8, 2025

Thanks for the contribution and many great ideas! A few high level thoughts:

  1. I think this PR can be broken down to 2 parts: 1. the topK filter 2. dynamically reconfigure the filters via a configmap backed FilterOrchestrator. It would be easier to review if you split this into at least 2 different PRs.
  2. For the topK filter, I did try a little bit before and didn't get good benchmark results. I didn't go further though so it's probably worth digging more. We do need to run some benchmarks to validate its effectiveness before we can merge it.
  3. A configmap based dynamic config is a great tool for dev/debug. But it adds both cognitive overhead and a runtime dependency. Just out of curiosity, do you find modifying the configmap easier than modifying the go code? The go code is a bit more verbose, but with IDE support you get autocompletion and error checks. I am afraid directly modifying the configmap text is error prone. Consider making configmap approach optional and not enabled by default is probably OK if people find it convenient. If we find it useful for operators to dynamically configure some knobs, then we can consider exposing those knobs to the InferencePool or a new CRD.
  4. I do agree that the filters are getting more complex. There is an issue to track if we can simply it Explore if we can simplify the filter tree in pkg/ext_proc/scheduling/scheduler.go #89

@kfswain
Copy link
Collaborator

kfswain commented Jan 8, 2025

I generally agree with @liu-cong here.

Especially point's 3 & 4.

I totally understand using a config map for rapid dev as you dont need to rebuild an image and can just update a config map to set the algo you want to test and iterate on. So from a development and experimentation standpoint I think that is very strong.

But from a supportability standpoint I think that could be problematic, for reasons that Cong mentioned. Debugging a live service that has its algorithm specified in code sounds brutal.

Could we perhaps make this a flag that can be passed in on startup? Perhaps it can be a --dev flag so that you can also have more verbose logging. Or it can be independent. I'm open to either.

I do want to emphasize that I think this is very powerful for development and experimentation so I think we want this, but need a way to keep it separate from the way we configure a production algo.

@Kuromesi
Copy link
Author

Kuromesi commented Jan 9, 2025

@kfswain @liu-cong Thanks for your reviews! I will break down my pr, and I do agree that configuring filters in configmap may introduce overhead and instability, I have made some optimization to resolve that.

I added storedFilter filter to store the last orchestrated filter, so if the configmap is not updated, we don't need to regenerate the filter everytime.

type FilterOrchestratorImpl struct {
	datastore    *backend.K8sDatastore
	lastUpdated  string
	storedFilter *filterChainImpl
}

And whenever error happens during filter orchestration, I just return the default filter, so it should work fine.

filter, err := o.orchestrate(f)
if err != nil {
	klog.Errorf("error orchestrating filters: %v", err)
	filter = defaultFilter
}

For the question "do you find modifying the configmap easier than modifying the go code?", I think for me it makes my testing much more easier since I'm trying various filter strategies to find the most suitable for my environment, modifying go code with IDE is definitely easier than configmap, but the rebuild and redeploy is time consuming (and my test environment is deployed in cloud provider which makes this process much more difficult). And I think configuration in configmap is not the final solution (I just use for tests), if you agree that configuring filters is necessary, maybe we should consider designing a new CRD to standardize and simplify the configuration process.

BTW, I do thin we should provide a configurable way for filters (like envoy did) and also provide different filter strategies as candidates, since different environment may require different filter strategy, I'm not sure about that, but I'm making tests to prove that.

I agree that the solution may be problematic and need more tests, I will add a flag to enable this feature and making more tests in my environment!

Also, I utilized llmperf to test the performance in my environment, I can share my results with you if you need. And I am trying to integrate it with the Istio service mesh, and so far it works pretty well. Thanks for your comments again! Please be free to ask me if you have any questions!

@Kuromesi Kuromesi force-pushed the kuromesi/plugablefilter branch from 7de96f5 to 87aeaac Compare January 9, 2025 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants