* Release 0.11.0rc1
* [test] fix test_jobs_launch_and_logs - increase request timeout (#8150)
* [Test] Fix `test_pool_down_single_pool` with Timeout to Check Pool Status (#8148)
Add timeout to check pool status.
* [docker] fix docker on nebius (#8151)
Upstream updated the error messages: moby/moby#50285. It seems that
Nebius has the new version first.
* [example] fix min-gpt train-rdzv on Nebius (#8152)
* [example] fix min-gpt train-rdzv on Nebius
* update comment
* [Tests] Fix serve failure on gcp (#8165)
* fix
* dont at least allow one service
* default resources
* more memory for jobs
* revert
---------
Co-authored-by: Christopher Cooper <christopher@cg505.com>
---------
Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Christopher Cooper <christopher@cg505.com>
Co-authored-by: lloyd-brown <lloyd@assemblesys.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
This fixes two issues preventing GCP B200 (a4-highgpu-8g) spot instances
from being recognized:
1. A4 VM pricing missing: GCP doesn't provide separate CPU/RAM pricing
for A4 instances in their SKUs API. The B200 GPU pricing includes
the full VM cost. Added special handling to set A4 VM price to 0 so
entries aren't dropped.
2. B200 spot pricing bug: Some B200 spot SKUs in GCP's API have
usageType='OnDemand' even though the description says "Spot
Preemptible". Added logic to match on description when usageType
doesn't match for B200 spot queries.
Fixes#8102🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
In some cases, we were _actually_ closing an fd >100, which seems to
sometimes be used by FileLock. Bump the fake fds values to a much
higher number to avoid the conflict.
* [client] remove client-side cache for request payload env vars
This causes issues with sdk usage, which may invoke the client
multiple times with different env vars within the same process.
Correctness here is worth the miniscule performance hit.
This fixes the smoke test `test_managed_jobs_config_labels_isolation`.
That test technically regressed in #7966, but would only fail when
both jobs were claimed by the same job controller process, which is
rare. After #7332, the test would only have a single job controller
process and started consistently failing.
* fix test
* Fix: Suppress FutureWarning from google.api_core about Python 3.10 support
Suppress the FutureWarning from google.api_core._python_version_support
that appears when GCP modules are imported. This warning is informational
and does not affect functionality.
Fixes#7886
* Fix: Use raw string for regex pattern in warning filter
Use raw string (r'...') for the regex pattern in warnings.filterwarnings
to follow Python best practices for regex patterns.
* Fix: Format warning filter to comply with line length limit
Split the message parameter across lines to fix:
- Line too long error (85/80 characters)
- YAPF formatting requirements
* ignore restart file on the first run
* avoid crashing the server on inconsistent consolidation mode config
* Revert "avoid crashing the server on inconsistent consolidation mode config"
This reverts commit dfa985e61d.
* only use a warning for inconsistent consolidation mode
# Convert PEP440 version to SemVer if needed for Helm versioning
# Handle cases like 1.0.0.dev20250218 -> 1.0.0-dev.20250218
# Handle cases like:
# 1.0.0.dev20250218 -> 1.0.0-dev.20250218
# 0.11.0rc0 -> 0.11.0-rc.0
# 0.11.0a1 -> 0.11.0-alpha.1
# 0.11.0b2 -> 0.11.0-beta.2
# 0.11.0.post1 -> 0.11.0+post.1
semversion=$(echo "$version" | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)\.dev([0-9]+)/\1-dev.\2/')
semversion=$(echo "$semversion" | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)rc([0-9]+)/\1-rc.\2/')
semversion=$(echo "$semversion" | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)a([0-9]+)/\1-alpha.\2/')
semversion=$(echo "$semversion" | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)b([0-9]+)/\1-beta.\2/')
# Post-releases use build metadata (+) since SemVer has no direct equivalent to PEP440's .post
# PEP440 .post means "after release", but SemVer build metadata has same precedence.
# TODO(romilb): If both 0.11.0 and 0.11.0+post.1 exist, Helm's "latest" behavior is undefined - some sources claim the newer one wins. Need to verify this.
semversion=$(echo "$semversion" | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)\.post([0-9]+)/\1+post.\2/')
# Update the version and name in the main skypilot chart
sed -i "s/^version:.*$/version: ${semversion}/" src/charts/skypilot/Chart.yaml
- [Nov 2025] Serve **Kimi K2 Thinking** with reasoning capabilities on your Kubernetes or clouds: [**example**](./llm/kimi-k2-thinking/)
- [Oct 2025] Run **RL training for LLMs** with SkyRL on your Kubernetes or clouds: [**example**](./llm/skyrl/)
- [Oct 2025] Train and serve [Andrej Karpathy's](https://x.com/karpathy/status/1977755427569111362) **nanochat** - the best ChatGPT that $100 can buy: [**example**](./llm/nanochat)
- [Oct 2025] Run large-scale **LLM training with TorchTitan** on any AI infra: [**example**](./examples/training/torchtitan)
@@ -49,7 +50,7 @@
- [Jul 2025] Finetune **Llama4** on any distributed cluster/cloud: [**example**](./llm/llama-4-finetuning/)
- [Jul 2025] Two-part blog series, `The Evolution of AI Job Orchestration`: (1) [Running AI jobs on GPU Neoclouds](https://blog.skypilot.co/ai-job-orchestration-pt1-gpu-neoclouds/), (2) [The AI-Native Control Plane & Orchestration that Finally Works for ML](https://blog.skypilot.co/ai-job-orchestration-pt2-ai-control-plane/)
- [Apr 2025] Spin up **Qwen3** on your cluster/cloud: [**example**](./llm/qwen/)
- [Feb 2025] Prepare and serve **Retrieval Augmented Generation (RAG) with DeepSeek-R1**: [**blog post**](https://blog.skypilot.co/deepseek-rag), [**example**](./llm/rag/)
**LLM Finetuning Cookbooks**: Finetuning Llama 2 / Llama 3.1 in your own cloud environment, privately: Llama 2 [**example**](./llm/vicuna-llama-2/) and [**blog**](https://blog.skypilot.co/finetuning-llama2-operational-guide/); Llama 3.1 [**example**](./llm/llama-3_1-finetuning/) and [**blog**](https://blog.skypilot.co/finetune-llama-3_1-on-your-infra/)
| AI apps | [RAG](https://docs.skypilot.co/en/latest/examples/applications/rag.html), [vector databases](https://docs.skypilot.co/en/latest/examples/applications/vector_database.html) (ChromaDB, CLIP) |
| Common frameworks | [Airflow](https://docs.skypilot.co/en/latest/examples/frameworks/airflow.html), [Jupyter](https://docs.skypilot.co/en/latest/examples/frameworks/jupyter.html) |
| Common frameworks | [Airflow](https://docs.skypilot.co/en/latest/examples/frameworks/airflow.html), [Jupyter](https://docs.skypilot.co/en/latest/examples/frameworks/jupyter.html), [marimo](https://docs.skypilot.co/en/latest/examples/frameworks/marimo.html) |
Source files can be found in [`llm/`](https://github.com/skypilot-org/skypilot/tree/master/llm) and [`examples/`](https://github.com/skypilot-org/skypilot/tree/master/examples).
By default, we maintain and use two SkyPilot container images for use on Kubernetes clusters:
1. ``us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot``: used for CPU-only clusters (`Dockerfile <https://github.com/skypilot-org/skypilot/blob/master/Dockerfile_k8s>`__).
2. ``us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot-gpu``: used for GPU clusters (`Dockerfile <https://github.com/skypilot-org/skypilot/blob/master/Dockerfile_k8s_gpu>`__).
1. ``us-docker.pkg.dev/sky-dev-465/skypilotk8s/skypilot``: used for CPU-only clusters (`Dockerfile <https://github.com/skypilot-org/skypilot/blob/master/Dockerfile_k8s>`__).
2. ``us-docker.pkg.dev/sky-dev-465/skypilotk8s/skypilot-gpu``: used for GPU clusters (`Dockerfile <https://github.com/skypilot-org/skypilot/blob/master/Dockerfile_k8s_gpu>`__).
These images are pre-installed with SkyPilot dependencies for fast startup.
@@ -36,7 +36,7 @@ Step A1 - Can you create pods and services?
As a sanity check, we will now try creating a simple pod running a HTTP server and a service to verify that your cluster and it's networking is functional.
We will use the SkyPilot default image :code:`us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot:latest` to verify that the image can be pulled from the registry.
We will use the SkyPilot default image :code:`us-docker.pkg.dev/sky-dev-465/skypilotk8s/skypilot:latest` to verify that the image can be pulled from the registry.
For multi-node clusters, volumes are mounted to all nodes. You must configure ``config.access_mode`` to ``ReadWriteMany`` and use a ``storage_class_name`` that supports the ``ReadWriteMany`` access mode. Otherwise, SkyPilot will fail to launch the cluster.
.. _volumes-on-kubernetes-manage:
Managing volumes
@@ -320,6 +324,10 @@ When you launch the cluster with ``sky launch``, the ephemeral volumes will be a
NAME TYPE INFRA SIZE USER WORKSPACE AGE STATUS LAST_USE USED_BY IS_EPHEMERAL
For multi-node clusters, ephemeral volumes are mounted to all nodes. You must configure ``config.access_mode`` to ``ReadWriteMany`` and use a ``storage_class_name`` that supports the ``ReadWriteMany`` access mode. Otherwise, SkyPilot will fail to launch the cluster.
When you terminate the cluster, the ephemeral volumes are automatically deleted:
@@ -39,7 +39,7 @@ In this example, create the following repository secrets:
- ``SKYPILOT_API_URL``: URL to the SkyPilot API server, in format of ``http(s)://url-or-ip``.
If using basic auth, the URL should also include the credentials in format of ``http(s)://username:password@url-or-ip``.
- ``SKYPILOT_SERVICE_ACCOUNT_TOKEN``: Only required if using OAuth. Service account token for GitHub actions user generated above.
- ``SLACK_BOT_TOKEN``: Optional, create a [Slack App](https://api.slack.com/apps) and get a slack "App-Level Token" with `connections:write` permssion to send a summary message. If not provided, a slack message is not sent after a job is queued.
- ``SLACK_BOT_TOKEN``: Optional, create a [Slack App](https://api.slack.com/apps) and get a slack "App-Level Token" with `connections:write` permission to send a summary message. If not provided, a slack message is not sent after a job is queued.
- ``SLACK_CHANNEL_ID``: Optional, Slack Channel ID to send a summary message. If not provided, a slack message is not sent after a job is queued.
Run a personal [marimo](https://marimo.io/) server on a SkyPilot cluster.

## Launch with CLI
Launch a marimo cluster with the command:
```bash
sky launch -c marimo-example marimo.yaml
```
Next, run this command to get the endpoint to connect via the browser:
```
sky status marimo-example --endpoints
```
## Customization
The `marimo.yaml` file can be customized to change the port, password, and other options. Check the [docs](https://docs.marimo.io/cli/#marimo-edit) for more information.
[TorchTitan](https://github.com/pytorch/torchtitan) is a PyTorch native platform for large-scale LLM training, featuring multi-dimensional parallelisms (FSDP2, Tensor/Pipeline/Context Parallel), distributed checkpointing, torch.compile, and Float8 support.
This example demonstrates how to run [TorchTitan](https://github.com/pytorch/torchtitan) on your Kubernetes clusters, or any hypersclaers, neoclouds using SkyPilot, in addition to the instructions for runnning on [Slurm](https://github.com/pytorch/torchtitan?tab=readme-ov-file#multi-node-training).
This example demonstrates how to run [TorchTitan](https://github.com/pytorch/torchtitan) on your Kubernetes clusters, or any hyperscalers, neoclouds using SkyPilot, in addition to the instructions for running on [Slurm](https://github.com/pytorch/torchtitan?tab=readme-ov-file#multi-node-training).
## Quick start
Here is how to finetune Llama 3.1 on 2 nodes with 8 H100 (or 8 H200):
@@ -8,7 +8,7 @@ Please edit the yamls as you like.
To run disk tests, run `sky launch e2e_disk.yaml -c e2e_disk --env HF_TOKEN="YOUR TOKEN"`
Requirements for disk benchmark, 2 s3 buckets (one for mount and one for mount cached) and 1 pvc (Check out [volumnes](https://docs.skypilot.co/en/stable/reference/volumes.html))
Requirements for disk benchmark, 2 s3 buckets (one for mount and one for mount cached) and 1 pvc (Check out [volumes](https://docs.skypilot.co/en/stable/reference/volumes.html))
[Kimi K2 Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking) is an advanced large language model created by [Moonshot AI](https://www.moonshot.ai/).
This recipe shows how to run Kimi K2 Thinking with reasoning capabilities on your Kubernetes or any cloud. It includes two modes:
- **Low Latency (TP8)**: Best for interactive applications requiring quick responses
- **High Throughput (TP8+DCP8)**: Best for batch processing and high-volume serving scenarios
## Prerequisites
- Check that you have installed SkyPilot ([docs](https://docs.skypilot.co/en/latest/getting-started/installation.html)).
- Check that `sky check` shows clouds or Kubernetes is enabled.
- **Note**: This model requires 8x H200 or H20 GPUs.
## Run Kimi K2 Thinking (Low Latency Mode)
For low-latency scenarios, use tensor parallelism:
ENDPOINT=$(sky status --endpoint 8081 kimi-k2-thinking)
curl http://$ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2-Thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant with deep reasoning capabilities."
},
{
"role": "user",
"content": "Explain how to solve the traveling salesman problem for 10 cities."
}
]
}' | jq .
```
The model will provide its reasoning process in the response, showing its chain-of-thought approach.
## Clean up resources
To shut down all resources:
```bash
sky down kimi-k2-thinking
```
## Serving Kimi-K2-Thinking: scaling up with SkyServe
With no change to the YAML, launch a fully managed service with autoscaling replicas and load-balancing on your infra:
```bash
sky serve up kimi-k2-thinking.sky.yaml -n kimi-k2-thinking
```
Wait until the service is ready:
```bash
watch -n10 sky serve status kimi-k2-thinking
```
Get a single endpoint that load-balances across replicas:
```bash
ENDPOINT=$(sky serve status --endpoint kimi-k2-thinking)
```
> **Tip:** SkyServe fully manages the lifecycle of your replicas. For example, if a spot replica is preempted, the controller will automatically replace it. This significantly reduces the operational burden while saving costs.
To curl the endpoint:
```bash
curl http://$ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2-Thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant with deep reasoning capabilities."
},
{
"role": "user",
"content": "Design a distributed system for real-time analytics."
}
]
}' | jq .
```
To shut down all resources:
```bash
sky serve down kimi-k2-thinking
```
See more details in [SkyServe docs](https://docs.skypilot.co/en/latest/serving/sky-serve.html).
image: us-central1-docker.pkg.dev/skypilot-375900/skypilotk8s/skypilot-gpu:latest # Using this image also serves as a way to "pre-pull" the image onto nodes
image: us-docker.pkg.dev/sky-dev-465/skypilotk8s/skypilot-gpu:latest # Using this image also serves as a way to "pre-pull" the image onto nodes
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.