Adding generated API docs + basic API docs and diagram

kubernetes-sigs · Dec 31, 2024 · 4b4eb91 · 4b4eb91
1 parent af1b216
commit 4b4eb91
Show file tree

Hide file tree

Showing 7 changed files with 303 additions and 5 deletions.
diff --git a/Makefile b/Makefile
@@ -162,6 +162,14 @@ live-docs:
 	docker build -t gaie/mkdocs hack/mkdocs/image
 	docker run --rm -it -p 3000:3000 -v ${PWD}:/docs gaie/mkdocs
 
+.PHONY: api-ref-docs
+api-ref-docs:
+	crd-ref-docs \
+		--source-path=${PWD}/api \
+		--config=crd-ref-docs.yaml \
+		--renderer=markdown \
+		--output-path=${PWD}/site-src/reference/spec.md
+
 ##@ Deployment
 
 ifndef ignore-not-found

diff --git a/crd-ref-docs.yaml b/crd-ref-docs.yaml
@@ -0,0 +1,10 @@
+processor:
+  ignoreTypes:
+    - "(InferencePool|InferenceModel)List$"
+  # RE2 regular expressions describing type fields that should be excluded from the generated documentation.
+  ignoreFields:
+    - "TypeMeta$"
+
+render:
+  # Version of Kubernetes to use when generating links to Kubernetes API documentation.
+  kubernetesVersion: 1.31
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -29,6 +29,7 @@ plugins:
   - mermaid2
 markdown_extensions:
   - admonition
+  - markdown.extensions.nl2br
   - meta
   - pymdownx.emoji:
       emoji_index: !!python/name:material.extensions.emoji.twemoji
@@ -59,10 +60,10 @@ nav:
       - Getting started: guides/index.md
     - Implementer's Guide: guides/implementers.md
   - Reference:
+    - API Reference: reference/spec.md
     - API Types:
       - InferencePool: api-types/inferencepool.md
       - InferenceModel: api-types/inferencemodel.md
-    - API specification: reference/spec.md
   - Enhancements:
     - Overview: gieps/overview.md
   - Contributing:

diff --git a/site-src/api-types/inferencepool.md b/site-src/api-types/inferencepool.md
@@ -7,7 +7,11 @@
 
 ## Background
 
-TODO
+InferencePool is
+
+<!-- Source: https://docs.google.com/presentation/d/11HEYCgFi-aya7FS91JvAfllHiIlvfgcp7qpi_Azjk4E/edit#slide=id.g292839eca6d_1_0 -->
+<img src="/images/inferencepool-vs-service.png" alt="Comparing InferencePool with Service" class="center" width="550" />
+
 
 ## Spec
 

diff --git a/site-src/images/inferencepool-vs-service.png b/site-src/images/inferencepool-vs-service.png
diff --git a/site-src/index.md b/site-src/index.md
@@ -15,8 +15,24 @@ they are expected to manage:
 
 ### InferencePool
 
+InferencePool represents a set of Inference-focused Pods and an extension that
+will be used to route to them. Within the broader Gateway API resource model,
+this resource is considered a "backend". In practice, that means that you'd
+replace a Kubernetes Service with an InferencePool. This resource has some
+similarities to Service (a way to select Pods and specify a port), but has some
+unique capabilities. With InferenceModel, you can configure a routing extension
+as well as inference-specific routing optimizations. For more information on
+this resource, refer to our [InferencePool documentation](/api-types/inferencepool).
+
 ### InferenceModel
 
+An InferenceModel represents a model or adapter, and configuration associated
+with that model. This resource enables you to configure the relative criticality
+of a model, and allows you to seamlessly translate the requested model name to
+one or more backend model names. Multiple InferenceModels can be attached to an
+InferencePool. For more information on this resource, refer to our
+[InferenceModel documentation](/api-types/inferencemodel).
+
 ## Composable Layers
 
 This project aims to develop an ecosystem of implementations that are fully

diff --git a/site-src/reference/spec.md b/site-src/reference/spec.md
@@ -1,5 +1,264 @@
-# API Specification
+# API Reference
+
+## Packages
+- [inference.networking.x-k8s.io/v1alpha1](#inferencenetworkingx-k8siov1alpha1)
+
+
+## inference.networking.x-k8s.io/v1alpha1
+
+Package v1alpha1 contains API Schema definitions for the gateway v1alpha1 API group
+
+### Resource Types
+- [InferenceModel](#inferencemodel)
+- [InferencePool](#inferencepool)
+
+
+
+#### Criticality
+
+_Underlying type:_ _string_
+
+Defines how important it is to serve the model compared to other models.
+
+_Validation:_
+- Enum: [Critical Default Sheddable]
+
+_Appears in:_
+- [InferenceModelSpec](#inferencemodelspec)
+
+| Field | Description |
+| --- | --- |
+| `Critical` | Most important. Requests to this band will be shed last.<br /> |
+| `Default` | More important than Sheddable, less important than Critical.<br />Requests in this band will be shed before critical traffic.<br />+kubebuilder:default=Default<br /> |
+| `Sheddable` | Least important. Requests to this band will be shed before all other bands.<br /> |
+
+
+#### InferenceModel
+
+
+
+InferenceModel is the Schema for the InferenceModels API
+
+
+
+
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `apiVersion` _string_ | `inference.networking.x-k8s.io/v1alpha1` | | |
+| `kind` _string_ | `InferenceModel` | | |
+| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
+| `spec` _[InferenceModelSpec](#inferencemodelspec)_ |  |  |  |
+| `status` _[InferenceModelStatus](#inferencemodelstatus)_ |  |  |  |
+
+
+#### InferenceModelSpec
+
+
+
+InferenceModelSpec represents a specific model use case. This resource is
+managed by the "Inference Workload Owner" persona.
+
+
+The Inference Workload Owner persona is: a team that trains, verifies, and
+leverages a large language model from a model frontend, drives the lifecycle
+and rollout of new versions of those models, and defines the specific
+performance and latency goals for the model. These workloads are
+expected to operate within an InferencePool sharing compute capacity with other
+InferenceModels, defined by the Inference Platform Admin.
+
+
+InferenceModel's modelName (not the ObjectMeta name) is unique for a given InferencePool,
+if the name is reused, an error will be shown on the status of a
+InferenceModel that attempted to reuse. The oldest InferenceModel, based on
+creation timestamp, will be selected to remain valid. In the event of a race
+condition, one will be selected at random.
+
+
+
+_Appears in:_
+- [InferenceModel](#inferencemodel)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `modelName` _string_ | The name of the model as the users set in the "model" parameter in the requests.<br />The name should be unique among the workloads that reference the same backend pool.<br />This is the parameter that will be used to match the request with. In the future, we may<br />allow to match on other request parameters. The other approach to support matching on<br />on other request parameters is to use a different ModelName per HTTPFilter.<br />Names can be reserved without implementing an actual model in the pool.<br />This can be done by specifying a target model and setting the weight to zero,<br />an error will be returned specifying that no valid target model is found. |  | MaxLength: 253 <br /> |
+| `criticality` _[Criticality](#criticality)_ | Defines how important it is to serve the model compared to other models referencing the same pool. | Default | Enum: [Critical Default Sheddable] <br /> |
+| `targetModels` _[TargetModel](#targetmodel) array_ | Allow multiple versions of a model for traffic splitting.<br />If not specified, the target model name is defaulted to the modelName parameter.<br />modelName is often in reference to a LoRA adapter. |  | MaxItems: 10 <br /> |
+| `poolRef` _[PoolObjectReference](#poolobjectreference)_ | Reference to the inference pool, the pool must exist in the same namespace. |  | Required: \{\} <br /> |
+
+
+#### InferenceModelStatus
+
+
+
+InferenceModelStatus defines the observed state of InferenceModel
+
+
+
+_Appears in:_
+- [InferenceModel](#inferencemodel)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#condition-v1-meta) array_ | Conditions track the state of the InferencePool. |  |  |
+
+
+#### InferencePool
+
+
+
+InferencePool is the Schema for the Inferencepools API
+
+
+
+
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `apiVersion` _string_ | `inference.networking.x-k8s.io/v1alpha1` | | |
+| `kind` _string_ | `InferencePool` | | |
+| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
+| `spec` _[InferencePoolSpec](#inferencepoolspec)_ |  |  |  |
+| `status` _[InferencePoolStatus](#inferencepoolstatus)_ |  |  |  |
+
+
+#### InferencePoolSpec
+
+
+
+InferencePoolSpec defines the desired state of InferencePool
+
+
+
+_Appears in:_
+- [InferencePool](#inferencepool)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `selector` _object (keys:[LabelKey](#labelkey), values:[LabelValue](#labelvalue))_ | Selector uses a map of label to watch model server pods<br />that should be included in the InferencePool. ModelServers should not<br />be with any other Service or InferencePool, that behavior is not supported<br />and will result in sub-optimal utilization.<br />In some cases, implementations may translate this to a Service selector, so this matches the simple<br />map used for Service selectors instead of the full Kubernetes LabelSelector type. |  | Required: \{\} <br /> |
+| `targetPortNumber` _integer_ | TargetPortNumber is the port number that the model servers within the pool expect<br />to receive traffic from.<br />This maps to the TargetPort in: https://pkg.go.dev/k8s.io/api/core/v1#ServicePort |  | Maximum: 65535 <br />Minimum: 0 <br />Required: \{\} <br /> |
+
+
+#### InferencePoolStatus
+
+
+
+InferencePoolStatus defines the observed state of InferencePool
+
+
+
+_Appears in:_
+- [InferencePool](#inferencepool)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#condition-v1-meta) array_ | Conditions track the state of the InferencePool. |  |  |
+
+
+#### LabelKey
+
+_Underlying type:_ _string_
+
+Originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731
+Duplicated as to not take an unexpected dependency on gw's API.
+
+
+LabelKey is the key of a label. This is used for validation
+of maps. This matches the Kubernetes "qualified name" validation that is used for labels.
+
+
+Valid values include:
+
+
+* example
+* example.com
+* example.com/path
+* example.com/path.html
+
+
+Invalid values include:
+
+
+* example~ - "~" is an invalid character
+* example.com. - can not start or end with "."
+
+_Validation:_
+- MaxLength: 253
+- MinLength: 1
+- Pattern: `^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$`
+
+_Appears in:_
+- [InferencePoolSpec](#inferencepoolspec)
+
+
+
+#### LabelValue
+
+_Underlying type:_ _string_
+
+LabelValue is the value of a label. This is used for validation
+of maps. This matches the Kubernetes label validation rules:
+* must be 63 characters or less (can be empty),
+* unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]),
+* could contain dashes (-), underscores (_), dots (.), and alphanumerics between.
+
+
+Valid values include:
+
+
+* MyValue
+* my.name
+* 123-my-value
+
+_Validation:_
+- MaxLength: 63
+- MinLength: 0
+- Pattern: `^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$`
+
+_Appears in:_
+- [InferencePoolSpec](#inferencepoolspec)
+
+
+
+#### PoolObjectReference
+
+
+
+PoolObjectReference identifies an API object within the namespace of the
+referrer.
+
+
+
+_Appears in:_
+- [InferenceModelSpec](#inferencemodelspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `group` _string_ | Group is the group of the referent. | inference.networking.x-k8s.io | MaxLength: 253 <br />Pattern: `^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` <br /> |
+| `kind` _string_ | Kind is kind of the referent. For example "InferencePool". | InferencePool | MaxLength: 63 <br />MinLength: 1 <br />Pattern: `^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$` <br /> |
+| `name` _string_ | Name is the name of the referent. |  | MaxLength: 253 <br />MinLength: 1 <br />Required: \{\} <br /> |
+
+
+#### TargetModel
+
+
+
+TargetModel represents a deployed model or a LoRA adapter. The
+Name field is expected to match the name of the LoRA adapter
+(or base model) as it is registered within the model server. Inference
+Gateway assumes that the model exists on the model server and is the
+responsibility of the user to validate a correct match. Should a model fail
+to exist at request time, the error is processed by the Instance Gateway,
+and then emitted on the appropriate InferenceModel object.
+
+
+
+_Appears in:_
+- [InferenceModelSpec](#inferencemodelspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `name` _string_ | The name of the adapter as expected by the ModelServer. |  | MaxLength: 253 <br /> |
+| `weight` _integer_ | Weight is used to determine the proportion of traffic that should be<br />sent to this target model when multiple versions of the model are specified. | 1 | Maximum: 1e+06 <br />Minimum: 0 <br /> |
 
-This page contains the API field specification for Gateway API.
 
-REPLACE_WITH_GENERATED_CONTENT