Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.1 API Review #154
base: v0.0
Are you sure you want to change the base?
v0.1 API Review #154
Changes from 2 commits
3c6bd6e
3f971ca
613fd37
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on feedback from @smarterclayton, this default should be removed to allow this concept to evolve by introducing
oneOf
(union types). If this default is being removed, should we also remove the associated "Default" enum?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - if we're no longer defaulting, we can't really call the middle value "default", so we likely need at least one new name here. Any ideas?
cc @ahg-g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, based on today's discussion we should remove this defaulting.
Regarding "Default", I think we have two options:
Medium
orBestEffort
Critical
andSheddable
. This is perhaps acceptable now that we have a clearer path to extending the scope in the future.wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are boiling this to two values, do we want to make the Criticality type be an indirection to bool rather than string, and then just rename to
Critical
?@candita mentions the same: #154 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still trying to understand the intent of
criticality
. Can someone help me understand what "serve" means in the godocs? Does "serve" mean if the Node does not have enough resources, i.e. memory, model servers of an InferenceModel withcriticality: BestEffort
will be evicted, or iscriticality
a networking classification concept?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair question. We used to have a glossary to define some of these terms. (we call it priority here)
But to quickly summarize.
Criticality
is intended to be a load-balancing concept. So if there is always enough capacity to place a request on any model server, criticality does nothing. But should we need to queue requests, criticality will prioritize the requests that come from acritical
workload. WE cant exactly guarantee like we can with scheduling on a node as traffic can wax and wane much more quicklyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfswain thanks for the glossary reference. Consider including points from the glossary or ^ to the
Criticality
godocs to better define the concept. Without the additional details, I can see someone interpretingCriticality
as pod priority, network priority, etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Using the name
Sheddable
exposes the result of this Criticality type rather than a descriptive type. I suggest usingNonCritical
or some other descriptive type name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that it describes what the criticality mechanism does.
I'm having a hard time coming up with an adequate name that wouldn't be synonymous.
Non-Critical
could be viewed as anything that isn't of theCritical
level, which is would include theDefault
andSheddable
.I suppose we could use
BestEffort
to be in line with the Pod QoS classes. But we don't have an equivalent forGuaranteed
and maybe a loose equivalent forBurstable
.Definitely open to better names though. LMKWYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also open to other name ideas here. Currently we have:
I don't really love
Criticality: NonCritical
orCriticality: Critical
, maybe there are 3 different terms that clearly indicate a 3 distinct layers of criticality? A naive example would be High, Medium, Low, but that's not the best.cc @ahg-g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on changing
Criticality
toPriority
or introducing a CriticalityClass concept similar to PriorityClass, where the level of criticality can be user-defined? For example:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so torn here. I agree with the idea of a user defined priority system... However:
I worry how that might interact with an algo. If we open the gates to user defined prio, we are essentially letting untested priority sets run in prod and implicitly supporting that. I'm no Load Balancing algo expert, but from my limited experience, it seems that could be incredibly challenging to guarantee that it works correctly.
If we keep a discrete set, we maintain the bounds on what is possible, and are able to provide stronger promises.
I also could be worrying about nothing. LMKWYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to summarize today's discussion during the community meeting:
We have another comment thread on changing the name for Default, but regarding Sheddable I feel it offers clear semantics as to what it means: that it is the first to be dropped when capacity is constrained. I don't have better suggestions though to address @candita which is completely valid!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing out an idea. Instead of
Criticality
there could be a simpleSheddable
boolean, described with caveats like: when model workloads exceed resources, some model workloads may be dropped (or moved to the end of a scheduling queue) in order to avoid starvation of higher value model workloads. If this model allows for best-effort scheduling, marking it as Sheddable will help obtain results faster from higher value model workloads.This begs the question about whether the scheduling priority belongs on workloads rather than models. Is there an API object for workloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
InferenceModel
object is intended to represent a model as a workload(inference). The term workload was elected to not be used, as K8s already has a pretty well established definition: https://kubernetes.io/docs/concepts/workloads/All that to say, I think we are in agreement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on API review feedback from @smarterclayton, this default should be removed. Defaults are easier to add later as the API matures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally,
weight
is a concept specific to a load-balancing/scheduling extension. During the API review, we discussed the need to document this dependency.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on all points. I think we can tackle all of these items with smaller individual PRs instead of trying to solve everything in one go, so if anyone's looking for ways to jump in, feel free to grab some of these actionable comments and turn them into PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was going to capture them here and then just copy the contents of the files back in to their original homes. Handling them piecewise works also. Open to whatever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If defaulting is removed, should this field be a pointer since it's optional? API conventions state this should be a pointer if it's optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agree on both points - default should be removed and this should become a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using a specifically typed condition as the type of the array, rather than a generic one. This helps to set expectations about what types of conditions are generated. For example, start with a set that describes the availability and degradation of the InferenceModell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack! I left the array as
metav1.Condition
to match Gateway's prior art: https://github.com/kubernetes-sigs/gateway-api/blob/0e465a76f43137e1d39771ea1a75b6190e7b4ac6/apis/v1/gateway_types.go#L735But have created typed condition types/reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@candita I completely agree that some sample typed conditions would be helpful here so we can have consistent status across implementations.
As far as a new Condition type, I don't think we've done that before. In Gateway API, we've exclusively used
metav1.Condition
and then added typed conditions as recommendations for what we'd expect to be set within those conditions. As far as I know, we've never actually added custom validation or types for these conditions though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfswain @robscott sounds good. Having a default/starting condition like set in https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1/gateway_types.go#L734 would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, completely agree, this also feels like something we should try to get in before releasing v0.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If applicable, I suggest reducing this port range to a smaller list of well-known ports that users can rely on for firewall configuration purposes. Also, don't allow overlap with other well-known ports like those used for dns, http/s, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding off on changing this one, just to gather consensus on what range we should limit it to. But I do agree with the idea.
It's possible that we could start with a small range and relax as needed. As the other direction would be nigh impossible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@candita this is meant to be a reference to a port number on a Pod, I can't think of any reasonable way to limit that since I think Kubernetes has likely scaled far enough that there's probably at least one case of each individual port being in use across the many Kubernetes that exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/154/files#r1907775587
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise with this section also