Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.1 API Review #154

Open
wants to merge 3 commits into
base: v0.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions api/groupversion_info.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
Copyright 2024 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package v1alpha1 contains API Schema definitions for the gateway v1alpha1 API group
// +kubebuilder:object:generate=true
// +groupName=inference.networking.x-k8s.io
package v1alpha1

import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)

var (
// GroupVersion is group version used to register these objects
GroupVersion = schema.GroupVersion{Group: "inference.networking.x-k8s.io", Version: "v1alpha1"}

// SchemeGroupVersion is alias to GroupVersion for client-go libraries.
// It is required by pkg/client/informers/externalversions/...
SchemeGroupVersion = GroupVersion

// SchemeBuilder is used to add go types to the GroupVersionKind scheme
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}

// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)

// Resource is required by pkg/client/listers/...
func Resource(resource string) schema.GroupResource {
return GroupVersion.WithResource(resource).GroupResource()
}
204 changes: 204 additions & 0 deletions api/inferencemodel_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
/*
Copyright 2024 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// InferenceModel is the Schema for the InferenceModels API.
//
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +genclient
type InferenceModel struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec InferenceModelSpec `json:"spec,omitempty"`
Status InferenceModelStatus `json:"status,omitempty"`
}

// InferenceModelList contains a list of InferenceModel.
//
// +kubebuilder:object:root=true
type InferenceModelList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []InferenceModel `json:"items"`
}

// InferenceModelSpec represents the desired state of a specific model use case. This resource is
// managed by the "Inference Workload Owner" persona.
//
// The Inference Workload Owner persona is someone that trains, verifies, and
// leverages a large language model from a model frontend, drives the lifecycle
// and rollout of new versions of those models, and defines the specific
// performance and latency goals for the model. These workloads are
// expected to operate within an InferencePool sharing compute capacity with other
// InferenceModels, defined by the Inference Platform Admin.
//
// InferenceModel's modelName (not the ObjectMeta name) is unique for a given InferencePool,
// if the name is reused, an error will be shown on the status of a
// InferenceModel that attempted to reuse. The oldest InferenceModel, based on
// creation timestamp, will be selected to remain valid. In the event of a race
// condition, one will be selected at random.
type InferenceModelSpec struct {
// ModelName is the name of the model as the users set in the "model" parameter in the requests.
// The name should be unique among the workloads that reference the same backend pool.
// This is the parameter that will be used to match the request with. In the future, we may
// allow to match on other request parameters. The other approach to support matching
// on other request parameters is to use a different ModelName per HTTPFilter.
// Names can be reserved without implementing an actual model in the pool.
// This can be done by specifying a target model and setting the weight to zero,
// an error will be returned specifying that no valid target model is found.
//
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Required
ModelName string `json:"modelName"`
kfswain marked this conversation as resolved.
Show resolved Hide resolved

// Criticality defines how important it is to serve the model compared to other models referencing the same pool.
//
// +optional
// +kubebuilder:default="Default"
Copy link
Contributor

@danehans danehans Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on feedback from @smarterclayton, this default should be removed to allow this concept to evolve by introducing oneOf (union types). If this default is being removed, should we also remove the associated "Default" enum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - if we're no longer defaulting, we can't really call the middle value "default", so we likely need at least one new name here. Any ideas?

cc @ahg-g

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, based on today's discussion we should remove this defaulting.

Regarding "Default", I think we have two options:

  1. Rename to Medium or BestEffort
  2. Remove it completely and stick with two only: Critical and Sheddable. This is perhaps acceptable now that we have a clearer path to extending the scope in the future.

wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are boiling this to two values, do we want to make the Criticality type be an indirection to bool rather than string, and then just rename to Critical?

@candita mentions the same: #154 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still trying to understand the intent of criticality. Can someone help me understand what "serve" means in the godocs? Does "serve" mean if the Node does not have enough resources, i.e. memory, model servers of an InferenceModel with criticality: BestEffort will be evicted, or is criticality a networking classification concept?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair question. We used to have a glossary to define some of these terms. (we call it priority here)

But to quickly summarize. Criticality is intended to be a load-balancing concept. So if there is always enough capacity to place a request on any model server, criticality does nothing. But should we need to queue requests, criticality will prioritize the requests that come from a critical workload. WE cant exactly guarantee like we can with scheduling on a node as traffic can wax and wane much more quickly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kfswain thanks for the glossary reference. Consider including points from the glossary or ^ to the Criticality godocs to better define the concept. Without the additional details, I can see someone interpreting Criticality as pod priority, network priority, etc.

Criticality *Criticality `json:"criticality,omitempty"`

// TargetModels allow multiple versions of a model for traffic splitting.
// If not specified, the target model name is defaulted to the modelName parameter.
// modelName is often in reference to a LoRA adapter.
//
// +optional
// +kubebuilder:validation:MaxItems=10
TargetModels []TargetModel `json:"targetModels,omitempty"`

// PoolRef is a reference to the inference pool, the pool must exist in the same namespace.
//
// +kubebuilder:validation:Required
PoolRef PoolObjectReference `json:"poolRef"`
}

// PoolObjectReference identifies an API object within the namespace of the
// referrer.
type PoolObjectReference struct {
// Group is the group of the referent.
//
// +optional
// +kubebuilder:default="inference.networking.x-k8s.io"
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Pattern=`^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
Group string `json:"group,omitempty"`

// Kind is kind of the referent. For example "InferencePool".
//
// +optional
// +kubebuilder:default="InferencePool"
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=63
// +kubebuilder:validation:Pattern=`^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$`
Kind string `json:"kind,omitempty"`

// Name is the name of the referent.
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Required
Name string `json:"name"`
}

// Criticality defines how important it is to serve the model compared to other models.
// +kubebuilder:validation:Enum=Critical;Default;Sheddable
type Criticality string

const (
// Critical defines the highest level of criticality. Requests to this band will be shed last.
Critical Criticality = "Critical"

// Default defines the default criticality level and is more important than Sheddable but less
// important than Critical. Requests in this band will be shed before critical traffic.
Default Criticality = "Default"

// Sheddable defines the lowest level of criticality. Requests to this band will be shed before
// all other bands.
Sheddable Criticality = "Sheddable"
Comment on lines +134 to +136
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Using the name Sheddable exposes the result of this Criticality type rather than a descriptive type. I suggest using NonCritical or some other descriptive type name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that it describes what the criticality mechanism does.

I'm having a hard time coming up with an adequate name that wouldn't be synonymous. Non-Critical could be viewed as anything that isn't of the Critical level, which is would include the Default and Sheddable.

I suppose we could use BestEffort to be in line with the Pod QoS classes. But we don't have an equivalent for Guaranteed and maybe a loose equivalent for Burstable.

Definitely open to better names though. LMKWYT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also open to other name ideas here. Currently we have:

  • Critical
  • Default
  • Sheddable

I don't really love Criticality: NonCritical or Criticality: Critical, maybe there are 3 different terms that clearly indicate a 3 distinct layers of criticality? A naive example would be High, Medium, Low, but that's not the best.

cc @ahg-g

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on changing Criticality to Priority or introducing a CriticalityClass concept similar to PriorityClass, where the level of criticality can be user-defined? For example:

apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: CriticalityClass
metadata:
  name: non-essential
value: 100
---
apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: InferenceModel
metadata:
  name: inferencemodel-sample
spec:
  criticalityClassName: non-essential
  modelName: tweet-summary
  poolRef:
    name: vllm-llama2-7b-pool
  targetModels:
  - name: tweet-summary-0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so torn here. I agree with the idea of a user defined priority system... However:

I worry how that might interact with an algo. If we open the gates to user defined prio, we are essentially letting untested priority sets run in prod and implicitly supporting that. I'm no Load Balancing algo expert, but from my limited experience, it seems that could be incredibly challenging to guarantee that it works correctly.
If we keep a discrete set, we maintain the bounds on what is possible, and are able to provide stronger promises.

I also could be worrying about nothing. LMKWYT

Copy link
Contributor

@ahg-g ahg-g Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to summarize today's discussion during the community meeting:

  1. There is some agreement on continuing to use an enum and a predefined set of criticality levels
  2. Change it to a reference and not default it in the api to allow extending this in the future with another parameter forming a oneOf semantics.

We have another comment thread on changing the name for Default, but regarding Sheddable I feel it offers clear semantics as to what it means: that it is the first to be dropped when capacity is constrained. I don't have better suggestions though to address @candita which is completely valid!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing out an idea. Instead of Criticality there could be a simple Sheddable boolean, described with caveats like: when model workloads exceed resources, some model workloads may be dropped (or moved to the end of a scheduling queue) in order to avoid starvation of higher value model workloads. If this model allows for best-effort scheduling, marking it as Sheddable will help obtain results faster from higher value model workloads.

This begs the question about whether the scheduling priority belongs on workloads rather than models. Is there an API object for workloads?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InferenceModel object is intended to represent a model as a workload(inference). The term workload was elected to not be used, as K8s already has a pretty well established definition: https://kubernetes.io/docs/concepts/workloads/

All that to say, I think we are in agreement.

)

// TargetModel represents a deployed model or a LoRA adapter. The
// Name field is expected to match the name of the LoRA adapter
// (or base model) as it is registered within the model server. Inference
// Gateway assumes that the model exists on the model server and it's the
// responsibility of the user to validate a correct match. Should a model fail
// to exist at request time, the error is processed by the Inference Gateway
// and emitted on the appropriate InferenceModel object.
type TargetModel struct {
// Name is the name of the adapter or base model, as expected by the ModelServer.
//
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Required
Name string `json:"name"`

// Weight is used to determine the proportion of traffic that should be
// sent to this model when multiple target models are specified.
//
// Weight defines the proportion of requests forwarded to the specified
// model. This is computed as weight/(sum of all weights in this
// TargetModels list). For non-zero values, there may be some epsilon from
// the exact proportion defined here depending on the precision an
// implementation supports. Weight is not a percentage and the sum of
// weights does not need to equal 100.
//
// If only one model is specified and it has a weight greater than 0, 100%
// of the traffic is forwarded to that model. If weight is set to 0, no
// traffic should be forwarded for this model. If unspecified, weight
// defaults to 1.
//
// +optional
// +kubebuilder:default=1
Copy link
Contributor

@danehans danehans Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on API review feedback from @smarterclayton, this default should be removed. Defaults are easier to add later as the API matures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, weight is a concept specific to a load-balancing/scheduling extension. During the API review, we discussed the need to document this dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on all points. I think we can tackle all of these items with smaller individual PRs instead of trying to solve everything in one go, so if anyone's looking for ways to jump in, feel free to grab some of these actionable comments and turn them into PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was going to capture them here and then just copy the contents of the files back in to their original homes. Handling them piecewise works also. Open to whatever

// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=1000000
Weight int32 `json:"weight,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If defaulting is removed, should this field be a pointer since it's optional? API conventions state this should be a pointer if it's optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agree on both points - default should be removed and this should become a pointer.

}

// InferenceModelStatus defines the observed state of InferenceModel
type InferenceModelStatus struct {
// Conditions track the state of the InferenceModel.
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
Copy link

@candita candita Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using a specifically typed condition as the type of the array, rather than a generic one. This helps to set expectations about what types of conditions are generated. For example, start with a set that describes the availability and degradation of the InferenceModell?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! I left the array as metav1.Condition to match Gateway's prior art: https://github.com/kubernetes-sigs/gateway-api/blob/0e465a76f43137e1d39771ea1a75b6190e7b4ac6/apis/v1/gateway_types.go#L735

But have created typed condition types/reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@candita I completely agree that some sample typed conditions would be helpful here so we can have consistent status across implementations.

As far as a new Condition type, I don't think we've done that before. In Gateway API, we've exclusively used metav1.Condition and then added typed conditions as recommendations for what we'd expect to be set within those conditions. As far as I know, we've never actually added custom validation or types for these conditions though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kfswain @robscott sounds good. Having a default/starting condition like set in https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1/gateway_types.go#L734 would be great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, completely agree, this also feels like something we should try to get in before releasing v0.1


// InferenceModelConditionType is a type of condition for the InferenceModel.
type InferenceModelConditionType string

// InferenceModelConditionReason is the reason for a given InferenceModelConditionType.
type InferenceModelConditionReason string

const (
// This condition indicates whether the model is ready for traffic or not, and why.
ModelConditionReady InferenceModelConditionType = "Ready"

// Desired state. Model is ready for serving with no conflicts or issues.
ModelReasonReady InferenceModelConditionReason = "Ready"

// This reason is used when a given ModelName already exists within the pool.
// Details about naming conflict resolution are on the ModelName field itself.
ModelReasonNameInUse InferenceModelConditionReason = "ModelNameInUse"

// This reason is the initial state, and indicates that the controller has not yet reconciled the InferenceModel.
PoolReasonPending InferenceModelConditionReason = "Pending"
)

func init() {
SchemeBuilder.Register(&InferenceModel{}, &InferenceModelList{})
}
135 changes: 135 additions & 0 deletions api/inferencepool_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
/*
Copyright 2024 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// InferencePool is the Schema for the InferencePools API.
//
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +genclient
type InferencePool struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec InferencePoolSpec `json:"spec,omitempty"`
Status InferencePoolStatus `json:"status,omitempty"`
}

// InferencePoolList contains a list of InferencePool.
//
// +kubebuilder:object:root=true
type InferencePoolList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []InferencePool `json:"items"`
}

// InferencePoolSpec defines the desired state of InferencePool
type InferencePoolSpec struct {
// Selector defines a map of label to watch model server pods
// that should be included in the InferencePool. ModelServers should not
// be with any other Service or InferencePool, that behavior is not supported
// and will result in sub-optimal utilization.
kfswain marked this conversation as resolved.
Show resolved Hide resolved
// In some cases, implementations may translate this to a Service selector, so this matches the simple
// map used for Service selectors instead of the full Kubernetes LabelSelector type.
//
// +kubebuilder:validation:Required
Selector map[LabelKey]LabelValue `json:"selector"`

// TargetPortNumber defines the port number to access the selected model servers.
// The number must be in the range 1 to 65535.
//
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
Comment on lines +56 to +59
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If applicable, I suggest reducing this port range to a smaller list of well-known ports that users can rely on for firewall configuration purposes. Also, don't allow overlap with other well-known ports like those used for dns, http/s, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off on changing this one, just to gather consensus on what range we should limit it to. But I do agree with the idea.

It's possible that we could start with a small range and relax as needed. As the other direction would be nigh impossible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@candita this is meant to be a reference to a port number on a Pod, I can't think of any reasonable way to limit that since I think Kubernetes has likely scaled far enough that there's probably at least one case of each individual port being in use across the many Kubernetes that exist.

// +kubebuilder:validation:Required
TargetPortNumber int32 `json:"targetPortNumber"`
}

// LabelKey was originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731
// Duplicated as to not take an unexpected dependency on gw's API.
//
// LabelKey is the key of a label. This is used for validation
// of maps. This matches the Kubernetes "qualified name" validation that is used for labels.
//
// Valid values include:
//
// * example
// * example.com
// * example.com/path
// * example.com/path.html
//
// Invalid values include:
//
// * example~ - "~" is an invalid character
// * example.com. - can not start or end with "."
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Pattern=`^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$`
type LabelKey string

// LabelValue is the value of a label. This is used for validation
// of maps. This matches the Kubernetes label validation rules:
// * must be 63 characters or less (can be empty),
// * unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]),
// * could contain dashes (-), underscores (_), dots (.), and alphanumerics between.
kfswain marked this conversation as resolved.
Show resolved Hide resolved
// Labels are case sensitive, so: my-label and My-Label are considered distinct.
//
// Valid values include:
//
// * MyValue
// * my.name
// * 123-my-value
//
// +kubebuilder:validation:MinLength=0
// +kubebuilder:validation:MaxLength=63
// +kubebuilder:validation:Pattern=`^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$`
type LabelValue string

// InferencePoolStatus defines the observed state of InferencePool
type InferencePoolStatus struct {
// Conditions track the state of the InferencePool.
Conditions []metav1.Condition `json:"conditions,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise with this section also

}

// InferencePoolConditionType is a type of condition for the InferencePool
type InferencePoolConditionType string

// InferencePoolConditionReason is the reason for a given InferencePoolConditionType
type InferencePoolConditionReason string

const (
// This condition indicates whether the pool is ready for traffic or not, and why.
PoolConditionReady InferencePoolConditionType = "Ready"

// Desired state. The pool and its components are initialized and ready for traffic.
PoolReasonReady InferencePoolConditionReason = "Ready"

// This reason is used when the EPP has not yet passed health checks, or has started failing them.
PoolReasonEPPNotHealthy InferencePoolConditionReason = "EndpointPickerNotHealthy"

// This reason is the initial state, and indicates that the controller has not yet reconciled this pool.
PoolReasonPending InferencePoolConditionReason = "Pending"
)

func init() {
SchemeBuilder.Register(&InferencePool{}, &InferencePoolList{})
}
Loading