This is an archived version of the documentation. View the latest version here.
Updated: 4/16/2015
This document is oriented at users who want a deeper understanding of the Kubernetes API structure, and developers wanting to extend the Kubernetes API. An introduction to using resources with kubectl can be found in (working_with_resources.md).
Table of Contents
The conventions of the Kubernetes API (and related APIs in the ecosystem) are intended to ease client development and ensure that configuration mechanisms can be implemented that work across a diverse set of use cases consistently.
The general style of the Kubernetes API is RESTful - clients create, update, delete, or retrieve a description of an object via the standard HTTP verbs (POST, PUT, DELETE, and GET) - and those APIs preferentially accept and return JSON. Kubernetes also exposes additional endpoints for non-standard verbs and allows alternative content types. All of the JSON accepted and returned by the server has a schema, identified by the "kind" and "apiVersion" fields. Where relevant HTTP header fields exist, they should mirror the content of JSON fields, but the information should not be represented only in the HTTP header.
The following terms are defined:
Each resource typically accepts and returns data of a single kind. A kind may be accepted or returned by multiple resources that reflect specific use cases. For instance, the kind "pod" is exposed as a "pods" resource that allows end users to create, update, and delete pods, while a separate "pod status" resource (that acts on "pod" kind) allows automated processes to update a subset of the fields in that resource. A "restart" resource might be exposed for a number of different resources to allow the same action to have different results for each object.
Resource collections should be all lowercase and plural, whereas kinds are CamelCase and singular.
Kinds are grouped into three categories:
Creating an API object is a record of intent - once created, the system will work to ensure that resource exists. All API objects have common metadata.
An object may have multiple resources that clients can use to perform specific actions that create, update, delete, or get.
Examples: Pods
, ReplicationControllers
, Services
, Namespaces
, Nodes
Lists have a limited set of common metadata. All lists use the "items" field to contain the array of objects they return.
Most objects defined in the system should have an endpoint that returns the full set of resources, as well as zero or more endpoints that return subsets of the full list. Some objects may be singletons (the current user, the system defaults) and may not have lists.
In addition, all lists that return objects with labels should support label filtering (see docs/user-guide/labels.md, and most lists should support filtering by fields.
Examples: PodLists, ServiceLists, NodeLists
TODO: Describe field filtering below or in a separate doc.
Given their limited scope, they have the same set of limited common metadata as lists.
The "size" action may accept a simple resource that has only a single field as input (the number of things). The "status" kind is returned when errors occur and is not persisted in the system.
Examples: Binding, Status
The standard REST verbs (defined below) MUST return singular JSON objects. Some API endpoints may deviate from the strict REST pattern and return resources that are not singular JSON objects, such as streams of JSON objects or unstructured text log data.
The term "kind" is reserved for these "top-level" API types. The term "type" should be used for distinguishing sub-categories within objects or subobjects.
All JSON objects returned by an API MUST have the following fields:
These fields are required for proper decoding of the object. They may be populated by the server by default from the specified URL path, but the client likely needs to know the values in order to construct the URL path.
Every object kind MUST have the following metadata in a nested object field called "metadata":
Every object SHOULD have the following metadata in a nested object field called "metadata":
Labels are intended for organizational purposes by end users (select the pods that match this label query). Annotations enable third-party automation and tooling to decorate objects with additional metadata for their own use.
By convention, the Kubernetes API makes a distinction between the specification of the desired state of an object (a nested object field called "spec") and the status of the object at the current time (a nested object field called "status"). The specification is a complete description of the desired state, including configuration settings provided by the user, default values expanded by the system, and properties initialized or otherwise changed after creation by other ecosystem components (e.g., schedulers, auto-scalers), and is persisted in stable storage with the API object. If the specification is deleted, the object will be purged from the system. The status summarizes the current state of the object in the system, and is usually persisted with the object by an automated processes but may be generated on the fly. At some cost and perhaps some temporary degradation in behavior, the status could be reconstructed by observation if it were lost.
When a new version of an object is POSTed or PUT, the "spec" is updated and available immediately. Over time the system will work to bring the "status" into line with the "spec". The system will drive toward the most recent "spec" regardless of previous versions of that stanza. In other words, if a value is changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system is not required to 'touch base' at 5 before changing the "status" to 3. In other words, the system's behavior is level-based rather than edge-based. This enables robust behavior in the presence of missed intermediate state changes.
The Kubernetes API also serves as the foundation for the declarative configuration schema for the system. In order to facilitate level-based operation and expression of declarative configuration, fields in the specification should have declarative rather than imperative names and semantics -- they represent the desired state, not actions intended to yield the desired state.
The PUT and POST verbs on objects will ignore the "status" values. A /status
subresource is provided to enable system components to update statuses of resources they manage.
Otherwise, PUT expects the whole object to be specified. Therefore, if a field is omitted it is assumed that the client wants to clear that field's value. The PUT verb does not accept partial updates. Modification of just part of an object may be achieved by GETting the resource, modifying part of the spec, labels, or annotations, and then PUTting it back. See concurrency control, below, regarding read-modify-write consistency when using this pattern. Some objects may expose alternative resource representations that allow mutation of the status, or performing custom actions on the object.
All objects that represent a physical resource whose state may vary from the user's desired intent SHOULD have a "spec" and a "status". Objects whose state cannot vary from the user's desired intent MAY have only "spec", and MAY rename "spec" to a more appropriate name.
Objects that contain both spec and status should not contain additional top-level fields other than the standard metadata fields.
Pending
(not yet fully physically realized), Running
or Active
(fully realized and active, but not necessarily operating correctly), and Terminated
(no longer active), but may vary slightly for different types of objects. New phase values should not be added to existing objects in the future. Like other status fields, it must be possible to ascertain the lifecycle phase by observation. Additional details regarding the current phase may be contained in other fields.True
, False
, or Unknown
. Unlike the phase, conditions are not expected to be monotonic -- their values may change back and forth. A typical condition type is Ready
, which indicates the object was believed to be fully operational at the time it was last probed. Conditions may carry additional information, such as the last probe time or last transition time. TODO(@vishh): Reason and Message.
Phases and conditions are observations and not, themselves, state machines, nor do we define comprehensive state machines for objects with behaviors associated with state transitions. The system is level-based and should assume an Open World. Additionally, new observations and details about these observations may be added over time.
In order to preserve extensibility, in the future, we intend to explicitly convey properties that users and components care about rather than requiring those properties to be inferred from observations.
Note that historical information status (e.g., last transition time, failure counts) is only provided at best effort, and is not guaranteed to not be lost.
Status information that may be large (especially unbounded in size, such as lists of references to other objects -- see below) and/or rapidly changing, such as resource usage, should be put into separate objects, with possibly a reference from the original object. This helps to ensure that GETs and watch remain reasonably efficient for the majority of clients, which may not need that data.
References to loosely coupled sets of objects, such as pods overseen by a replication controller, are usually best referred to using a label selector. In order to ensure that GETs of individual objects remain bounded in time and space, these sets may be queried via separate API queries, but will not be expanded in the referring object's status.
References to specific objects, especially specific resource versions and/or specific fields of those objects, are specified using the ObjectReference
type. Unlike partial URLs, the ObjectReference type facilitates flexible defaulting of fields from the referring object or other contextual information.
References in the status of the referee to the referrer may be permitted, when the references are one-to-one and do not need to be frequently updated, particularly in an edge-based manner.
Discussed in #2004 and elsewhere. There are no maps of subobjects in any API objects. Instead, the convention is to use a list of subobjects containing name fields.
For example:
ports:
- name: www
containerPort: 80
vs.
ports:
www:
containerPort: 80
This rule maintains the invariant that all JSON/YAML keys are fields in API objects. The only exceptions are pure maps in the API (currently, labels, selectors, and annotations), as opposed to sets of subobjects.
Some fields will have a list of allowed values (enumerations). These values will be strings, and they will be in CamelCase, with an initial uppercase letter. Examples: "ClusterFirst", "Pending", "ClientIP".
Every list or simple kind SHOULD have the following metadata in a nested object field called "metadata":
Every simple kind returned by the server, and any simple kind sent to the server that must support idempotency or optimistic concurrency should return this value.Since simple resources are often used as input alternate actions that modify objects, the resource version of the simple resource should correspond to the resource version of the object.
An API may represent a single entity in different ways for different clients, or transform an object after certain transitions in the system occur. In these cases, one request object may have two representations available as different resources, or different kinds.
An example is a Service, which represents the intent of the user to group a set of pods with common behavior on common ports. When Kubernetes detects a pod matches the service selector, the IP address and port of the pod are added to an Endpoints resource for that Service. The Endpoints resource exists only if the Service exists, but exposes only the IPs and ports of the selected pods. The full service is represented by two distinct resources - under the original Service resource the user created, as well as in the Endpoints resource.
As another example, a "pod status" resource may accept a PUT with the "pod" kind, with different rules about what fields may be changed.
Future versions of Kubernetes may allow alternative encodings of objects beyond JSON.
API resources should use the traditional REST pattern:
Kubernetes by convention exposes additional verbs as new root endpoints with singular names. Examples:
These are verbs which change the fundamental type of data returned (watch returns a stream of JSON instead of a single JSON object). Support of additional verbs is not required for all object types.
Two additional verbs redirect
and proxy
provide access to cluster resources as described in docs/user-guide/accessing-the-cluster.md.
When resources wish to expose alternative actions that are closely coupled to a single resource, they should do so using new sub-resources. An example is allowing automated processes to update the "status" field of a Pod. The /pods
endpoint only allows updates to "metadata" and "spec", since those reflect end-user intent. An automated process should be able to modify status for users to see by sending an updated Pod kind to the server to the "/pods/<name>/status" endpoint - the alternate endpoint allows different rules to be applied to the update, and access to be appropriately restricted. Likewise, some actions like "stop" or "scale" are best represented as REST sub-resources that are POSTed to. The POST action may require a simple kind to be provided if the action requires parameters, or function without a request body.
TODO: more documentation of Watch
The API supports three different PATCH operations, determined by their corresponding Content-Type header:
Content-Type: application/json-patch+json
{"op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ]}
. For more details on how to use JSON Patch, see the RFC.Content-Type: application/merge-json-patch+json
Content-Type: application/strategic-merge-patch+json
In the standard JSON merge patch, JSON objects are always merged but lists are always replaced. Often that isn't what we want. Let's say we start with the following Pod:
spec:
containers:
- name: nginx
image: nginx-1.0
...and we POST that to the server (as JSON). Then let's say we want to add a container to this Pod.
PATCH /api/v1/namespaces/default/pods/pod-name
spec:
containers:
- name: log-tailer
image: log-tailer-1.0
If we were to use standard Merge Patch, the entire container list would be replaced with the single log-tailer container. However, our intent is for the container lists to merge together based on the name
field.
To solve this problem, Strategic Merge Patch uses metadata attached to the API objects to determine what lists should be merged and which ones should not. Currently the metadata is available as struct tags on the API objects themselves, but will become available to clients as Swagger annotations in the future. In the above example, the patchStrategy
metadata for the containers
field would be merge
and the patchMergeKey
would be name
.
Note: If the patch results in merging two lists of scalars, the scalars are first deduplicated and then merged.
Strategic Merge Patch also supports special operations as listed below.
To override the container list to be strictly replaced, regardless of the default:
containers:
- name: nginx
image: nginx-1.0
- $patch: replace # any further $patch operations nested in this list will be ignored
To delete an element of a list that should be merged:
containers:
- name: nginx
image: nginx-1.0
- $patch: delete
name: log-tailer # merge key and value goes here
To indicate that a map should not be merged and instead should be taken literally:
$patch: replace # recursive and applies to all fields of the map it's in
containers:
- name: nginx
image: nginx-1.0
To delete a field of a map:
name: nginx
image: nginx-1.0
labels:
live: null # set the value of the map key to null
All compatible Kubernetes APIs MUST support "name idempotency" and respond with an HTTP status code 409 when a request is made to POST an object that has the same name as an existing object in the system. See docs/user-guide/identifiers.md for details.
Names generated by the system may be requested using metadata.generateName
. GenerateName indicates that the name should be made unique by the server prior to persisting it. A non-empty value for the field indicates the name will be made unique (and the name returned to the client will be different than the name passed). The value of this field will be combined with a unique suffix on the server if the Name field has not been provided. The provided value must be valid within the rules for Name, and may be truncated by the length of the suffix required to make the value unique on the server. If this field is specified, and Name is not present, the server will NOT return a 409 if the generated name exists - instead, it will either return 201 Created or 504 with Reason ServerTimeout
indicating a unique name could not be found in the time allotted, and the client should retry (optionally after the time indicated in the Retry-After header).
Default resource values are API version-specific, and they are applied during
the conversion from API-versioned declarative configuration to internal objects
representing the desired state (Spec
) of the resource. Subsequent GETs of the
resource will include the default values explicitly.
Incorporating the default values into the Spec
ensures that Spec
depicts the
full desired state so that it is easier for the system to determine how to
achieve the state, and for the user to know what to anticipate.
API version-specific default values are set by the API server.
Late initialization is when resource fields are set by a system controller after an object is created/updated.
For example, the scheduler sets the pod.spec.nodeName
field after the pod is created.
Late-initializers should only make the following types of modifications:
patchStrategy:"merge"
attribute in
the type definition).These conventions:
Although the apiserver Admission Control stage acts prior to object creation, Admission Control plugins should follow the Late Initialization conventions too, to allow their implementation to be later moved to a 'controller', or to client libraries.
Kubernetes leverages the concept of resource versions to achieve optimistic concurrency. All Kubernetes resources have a "resourceVersion" field as part of their metadata. This resourceVersion is a string that identifies the internal version of an object that can be used by clients to determine when objects have changed. When a record is about to be updated, it's version is checked against a pre-saved value, and if it doesn't match, the update fails with a StatusConflict (HTTP status code 409).
The resourceVersion is changed by the server every time an object is modified. If resourceVersion is included with the PUT operation the system will verify that there have not been other successful mutations to the resource during a read/modify/write cycle, by verifying that the current value of resourceVersion matches the specified value.
The resourceVersion is currently backed by etcd's modifiedIndex. However, it's important to note that the application should not rely on the implementation details of the versioning system maintained by Kubernetes. We may change the implementation of resourceVersion in the future, such as to change it to a timestamp or per-object counter.
The only way for a client to know the expected value of resourceVersion is to have received it from the server in response to a prior operation, typically a GET. This value MUST be treated as opaque by clients and passed unmodified back to the server. Clients should not assume that the resource version has meaning across namespaces, different kinds of resources, or different servers. Currently, the value of resourceVersion is set to match etcd's sequencer. You could think of it as a logical clock the API server can use to order requests. However, we expect the implementation of resourceVersion to change in the future, such as in the case we shard the state by kind and/or namespace, or port to another storage system.
In the case of a conflict, the correct client action at this point is to GET the resource again, apply the changes afresh, and try submitting again. This mechanism can be used to prevent races like the following:
Client #1 Client #2
GET Foo GET Foo
Set Foo.Bar = "one" Set Foo.Baz = "two"
PUT Foo PUT Foo
When these sequences occur in parallel, either the change to Foo.Bar or the change to Foo.Baz can be lost.
On the other hand, when specifying the resourceVersion, one of the PUTs will fail, since whichever write succeeds changes the resourceVersion for Foo.
resourceVersion may be used as a precondition for other operations (e.g., GET, DELETE) in the future, such as for read-after-write consistency in the presence of caching.
"Watch" operations specify resourceVersion using a query parameter. It is used to specify the point at which to begin watching the specified resources. This may be used to ensure that no mutations are missed between a GET of a resource (or list of resources) and a subsequent Watch, even if the current version of the resource is more recent. This is currently the main reason that list operations (GET on a collection) return resourceVersion.
APIs may return alternative representations of any resource in response to an Accept header or under alternative endpoints, but the default serialization for input and output of API responses MUST be JSON.
All dates should be serialized as RFC3339 strings.
Units must either be explicit in the field name (e.g., timeoutSeconds
), or must be specified as part of the value (e.g., resource.Quantity
). Which approach is preferred is TBD.
Some APIs may need to identify which field in a JSON object is invalid, or to reference a value to extract from a separate resource. The current recommendation is to use standard JavaScript syntax for accessing that field, assuming the JSON object was transformed into a JavaScript object.
Examples:
fields[0].state.current
TODO: Plugins, extensions, nested kinds, headers
The server will respond with HTTP status codes that match the HTTP spec. See the section below for a breakdown of the types of status codes the server will send.
The following HTTP status codes may be returned by the API.
200 StatusOK
201 StatusCreated
204 StatusNoContent
307 StatusTemporaryRedirect
400 StatusBadRequest
401 StatusUnauthorized
403 StatusForbidden
404 StatusNotFound
405 StatusMethodNotAllowed
409 StatusConflict
Conflict
from the status
response section below on how to retrieve more information about the nature of the conflict.ResourceVersion
).422 StatusUnprocessableEntity
429 StatusTooManyRequests
Retry-After
HTTP header from the response, and wait at least that long before retrying.500 StatusInternalServerError
503 StatusServiceUnavailable
504 StatusServerTimeout
Kubernetes will always return the Status
kind from any API endpoint when an error occurs.
Clients SHOULD handle these types of objects when appropriate.
A Status
kind will be returned by the API in two cases:
DELETE
call is successful.The status object is encoded as JSON and provided as the body of the response. The status object contains fields for humans and machine consumers of the API to get more detailed information for the cause of the failure. The information in the status object supplements, but does not override, the HTTP status code's meaning. When fields in the status object have the same meaning as generally defined HTTP headers and that header is returned with the response, the header should be considered as having higher priority.
Example:
$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana
> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1
> User-Agent: curl/7.26.0
> Host: 10.240.122.184
> Accept: */*
> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc
>
< HTTP/1.1 404 Not Found
< Content-Type: application/json
< Date: Wed, 20 May 2015 18:10:42 GMT
< Content-Length: 232
<
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "pods \"grafana\" not found",
"reason": "NotFound",
"details": {
"name": "grafana",
"kind": "pods"
},
"code": 404
}
status
field contains one of two possible values:
Success
Failure
message
may contain human-readable description of the error
reason
may contain a machine-readable description of why this operation is in the Failure
status. If this value is empty there is no information available. The reason
clarifies an HTTP status code but does not override it.
details
may contain extended data associated with the reason. Each reason may define its own extended details. This field is optional and the data returned is not guaranteed to conform to any schema except that defined by the reason type.
Possible values for the reason
and details
fields:
BadRequest
status reason
Invalid
above which indicates that the API call could possibly succeed, but the data was invalid.400 StatusBadRequest
Unauthorized
kind string
name string
401 StatusUnauthorized
Forbidden
kind string
name string
403 StatusForbidden
NotFound
kind string
name string
404 StatusNotFound
AlreadyExists
kind string
name string
409 StatusConflict
Conflict
409 StatusConflict
Invalid
kind string
name string
causes
StatusCause
entries indicating the data in the provided resource that was invalid. The reason
, message
, and field
attributes will be set.422 StatusUnprocessableEntity
Timeout
429 TooManyRequests
Retry-After
HTTP header and return retryAfterSeconds
in the details field of the object. A value of 0
is the default.ServerTimeout
kind string
name string
Retry-After
HTTP header and return retryAfterSeconds
in the details field of the object. A value of 0
is the default.504 StatusServerTimeout
MethodNotAllowed
405 StatusMethodNotAllowed
InternalError
causes
500 StatusInternalServerError
code
may contain the suggested HTTP return code for this status.
TODO: Document events (refer to another doc for details)