5 Commits

Author SHA1 Message Date
  Alex Kim 54ee820143
[docs] Add NVIDIA Dynamo serving example (#7333) 16 hours ago
  Aylei 8572b31924
Fixed plugin load in metrics process (#8318) 22 hours ago
  Zhanghao Wu 82a0ea1051
[Template] Add `--address` for list nodes to avoid warning for multiple ray cluster and fix a race in ray template (#8306) 1 day ago
  Zhanghao Wu 1aa2398db3
[k8s] Update the instruction for dealing with exec-based kubeconfig (#8210) 1 day ago
  lloyd-brown 3161f0d2b3
[Docs] Add Tip to Restart API Server if Credential Setup Fails (#8314) 1 day ago
13 changed files with 423 additions and 44 deletions
Split View
  1. +1
    -0
      docs/source/examples/serving/index.rst
  2. +1
    -0
      docs/source/examples/serving/nvidia-dynamo.md
  3. +3
    -2
      docs/source/generate_examples.py
  4. +5
    -0
      docs/source/getting-started/installation.rst
  5. +9
    -5
      docs/source/reference/api-server/api-server-admin-deploy.rst
  6. +198
    -0
      examples/serve/nvidia-dynamo/README.md
  7. +60
    -0
      examples/serve/nvidia-dynamo/nvidia-dynamo-multinode.sky.yaml
  8. +29
    -0
      examples/serve/nvidia-dynamo/nvidia-dynamo.sky.yaml
  9. +1
    -0
      sky/server/plugins.py
  10. +5
    -0
      sky/server/uvicorn.py
  11. +42
    -33
      sky/utils/kubernetes/generate_kubeconfig.sh
  12. +8
    -4
      sky_templates/ray/start_cluster
  13. +61
    -0
      tests/unit_tests/test_sky/server/test_plugins.py

+ 1
- 0
docs/source/examples/serving/index.rst View File

@@ -6,6 +6,7 @@ Serving

vLLM <vllm>
SGLang <sglang>
Nvidia Dynamo <nvidia-dynamo>
Ollama <ollama>
Hugging Face TGI <tgi>
LoRAX <lorax>


+ 1
- 0
docs/source/examples/serving/nvidia-dynamo.md View File

@@ -0,0 +1 @@
../../generated-examples/nvidia-dynamo.md

+ 3
- 2
docs/source/generate_examples.py View File

@@ -241,10 +241,11 @@ def _work(example_dir: pathlib.Path):
globs = [example_dir.glob(pattern) for pattern in _GLOB_PATTERNS]
for path in itertools.chain(*globs):
examples.append(Example(path))
# Find examples in subdirectories (search up to 2 levels deep)

# Find examples in subdirectories (up to 3 levels deep)
for path in example_dir.glob("*/*.md"):
examples.append(Example(path.parent))
# Also search 2 levels deep for nested examples like training/torchtitan
for path in example_dir.glob("*/*/*.md"):
examples.append(Example(path.parent))



+ 5
- 0
docs/source/getting-started/installation.rst View File

@@ -268,6 +268,11 @@ section :ref:`below <cloud-account-setup>`.

To check credentials only for specific clouds, pass the clouds as arguments: :code:`sky check aws gcp`

.. tip::

If you are having trouble setting up credentials, it may be because the API server started before they were
configured. Try restarting the API server by running :code:`sky api stop` and then :code:`sky api start`.

.. _cloud-account-setup:

Set up Kubernetes or clouds


+ 9
- 5
docs/source/reference/api-server/api-server-admin-deploy.rst View File

@@ -213,16 +213,20 @@ Following tabs describe how to configure credentials for different clouds on the

.. tip::

If you are using a kubeconfig file that contains `exec-based authentication <https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuration>`_ (e.g., GKE's default ``gke-gcloud-auth-plugin`` based authentication), you will need to strip the path information from the ``command`` field in the exec configuration.
You can use the ``exec_kubeconfig_converter.py`` script to do this.
If you are using a kubeconfig file that contains `exec-based authentication <https://kubernetes.io/docs/reference/access-authn-authz/authentication/#configuration>`_ (e.g., GKE's default ``gke-gcloud-auth-plugin``, Nebius Managed Kubernetes, OCI, etc.), you will need to generate a kubeconfig with static authentication instead.
You can use the ``generate_kubeconfig.sh`` script to do this.

.. code-block:: bash

python -m sky.utils.kubernetes.exec_kubeconfig_converter --input ~/.kube/config --output ~/.kube/config.converted
# Download the script
curl -O https://raw.githubusercontent.com/skypilot-org/skypilot/refs/heads/master/sky/utils/kubernetes/generate_kubeconfig.sh && chmod +x generate_kubeconfig.sh

Then create the Kubernetes secret with the converted kubeconfig file ``~/.kube/config.converted``.
# Generate the kubeconfig
export KUBECONFIG=$HOME/.kube/config # or the path to your kubeconfig file
./generate_kubeconfig.sh

Then create the Kubernetes secret with the generated kubeconfig file ``./kubeconfig``.

The specific cloud's credential for the exec-based authentication also needs to be configured. For example, to enable exec-based authentication for GKE, you also need to setup GCP credentials (see the GCP tab above).

.. dropdown:: Update Kubernetes credentials



+ 198
- 0
examples/serve/nvidia-dynamo/README.md View File

@@ -0,0 +1,198 @@
# Run Nvidia Dynamo on any cloud or Kubernetes with SkyPilot

<p align="center">
<picture>
<img src="https://i.imgur.com/CBb1Yyi.png" width=75%>
</picture>
</p>


This recipe shows how to deploy and serve models using [Nvidia Dynamo](https://github.com/ai-dynamo/dynamo) on any cloud provider or Kubernetes cluster with [SkyPilot](https://docs.skypilot.co/en/latest/docs/index.html). Run Dynamo seamlessly across AWS, GCP, Azure, Lambda Labs, Nebius and more - or bring your own Kubernetes infrastructure.

Together, SkyPilot and Dynamo offer developers unparalleled flexibility: deploy any LLM, on any cloud, using any inference framework, all with minimal effort and operational overhead.

## What is Nvidia Dynamo?

NVIDIA Dynamo is a high-performance inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo solves the computational challenges of large language models that exceed single GPU capabilities.

### Core Features
- **Disaggregated Prefill & Decode**: Separates inference phases for optimal resource utilization
- **Dynamic GPU Scheduling**: Intelligent workload distribution across available GPUs
- **LLM-Aware Request Routing**: Smart routing based on model characteristics and cache states
- **Accelerated Data Transfer**: High-performance data movement between nodes
- **KV Cache Offloading**: Multi-tiered memory management for efficient cache utilization

## Launching Nvidia Dynamo with SkyPilot

### Single-Node Example (`nvidia-dynamo.sky.yaml`)
- ✅ **SGLang Backend**: High-performance inference engine. Can be swapped with vLLM if required.
- ✅ **OpenAI-Compatible API**: Drop-in replacement for OpenAI endpoints
- ✅ **Basic Load Balancing**: Round-robin request distribution
- ✅ **Auto-Discovery**: Dynamic worker registration

### Multi-Node Example (`nvidia-dynamo-multinode.sky.yaml`)
- ✅ **KV-Aware Routing**: Intelligent cache-based request routing (`--router-mode kv`)
- ✅ **Multi-Node Distribution**: 2 nodes × 8 H100 GPUs (16 total GPUs)
- ✅ **Data Parallel Attention**: DP=2 across nodes (`--enable-dp-attention`)
- ✅ **Tensor Parallelism**: TP=8 per node for large model support
- ✅ **Disaggregated Transfer**: NIXL backend for KV cache transfers

**Model**: `Qwen/Qwen3-8B` (8B parameter reasoning model)

**Architecture**: 2 nodes, each with 8×H100 GPUs, TP=8, DP=2

## Launch Cluster

Once SkyPilot is set up (see [Appendix: Preparation](#appendix-preparation)), launch the example with:

```bash
sky launch -c dynamo nvidia-dynamo.sky.yaml
```

## Test Endpoint

```bash
export ENDPOINT=$(sky status --endpoint 8080 dynamo)

curl http://$ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"stream":false,
"max_tokens": 300
}' | jq
...
{
"id": "chatcmpl-e2b5b2bd-59fb-4321-8afc-3b5bb4a717a7",
"choices": [
{
"index": 0,
"message": {
"content": "<think>\nOkay, the user greeted me with \"Hello, how are you?\" I should respond in a friendly and natural way. Let me think about the appropriate response.\n\nFirst, I need to acknowledge their greeting. Maybe start with a cheerful \"Hello!\" to match their tone. Then, I should mention that I'm just a virtual assistant, so I don't have feelings, but I'm here to help. It's important to keep it conversational.\n\nI should make sure to invite them to ask questions or share what they need help with. That way, it's open-ended and encourages further interaction. Also, adding an emoji like 😊 can make the response more friendly and approachable.\n\nWait, should I mention my name again? Maybe not necessary since the user already knows. Just keep it simple and welcoming. Let me check the example response they provided. Yes, it's similar to that. I think that's all. Keep the tone positive and helpful.\n</think>\n\nHello! 😊 I'm just a virtual assistant, so I don't have feelings, but I'm here to help you with whatever you need! What can I assist you with today?",
"role": "assistant",
"reasoning_content": null
},
"finish_reason": "stop"
}
],
"created": 1758497220,
"model": "Qwen/Qwen3-8B",
"object": "chat.completion",
"usage": {
"prompt_tokens": 14,
"completion_tokens": 235,
"total_tokens": 249
}
}
```

## Multi-Node Serving

### Launch Multi-Node Cluster

```bash
sky launch -c dynamo-multi nvidia-dynamo-multinode.sky.yaml
```

### Test Multi-Node Endpoint

```bash
export ENDPOINT=$(sky status --endpoint 8080 dynamo-multi)

curl http://$ENDPOINT/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"stream":false,
"max_tokens": 300
}' | jq
```

Example output:
```json
{
"id": "chatcmpl-5524560e-aecd-4b63-a41b-23d0a787c9b0",
"choices": [
{
"index": 0,
"message": {
"content": "<think>\nOkay, the user greeted me with \"Hello, how are you?\" I need to respond appropriately. Let me start by acknowledging their greeting. I should mention that I'm an AI assistant, so I don't have feelings, but I'm here to help.\n\nI should keep the response friendly and open-ended. Maybe ask them how they're doing to encourage a conversation. Let me check if there's anything specific they might need. Oh, maybe they have a question or need assistance with something. I should make sure to invite them to ask for help if needed. Also, keep the tone positive and approachable. Alright, putting it all together now.\n</think>\n\nHello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help! How are you today? 😊 If you have any questions or need assistance, feel free to ask!",
"role": "assistant",
"reasoning_content": null
},
"finish_reason": "stop"
}
],
"created": 1758501329,
"model": "Qwen/Qwen3-8B",
"object": "chat.completion",
"usage": {
"prompt_tokens": 14,
"completion_tokens": 181,
"total_tokens": 195
}
}
```

## Verifying KV-Aware Routing

Check logs for these indicators:

```
INFO dynamo_llm::kv_router: KV Routing initialized
INFO dynamo_llm::kv_router::scheduler: Formula for 7587889683284143912 with 0 cached blocks: 0.875 = 1.0 * prefill_blocks + decode_blocks = 1.0 * 0.875 + 0.000
INFO dynamo_llm::kv_router::scheduler: Selected worker: 7587889683284143912, logit: 0.875, cached blocks: 0, total blocks: 109815
```

The routing formula shows worker selection based on KV cache hits and load balancing.

## Appendix: Preparation

1. Install SkyPilot for launching the serving:
```bash
pip install skypilot-nightly[aws,gcp,kubernetes]
# or other clouds (17+ clouds and kubernetes are supported) you have setup
# See: https://docs.skypilot.co/en/latest/getting-started/installation.html
```

2. Check your infra setup:
```bash
sky check

🎉 Enabled clouds 🎉
✔ AWS
✔ GCP
✔ Azure
...
✔ Kubernetes
```

3. Set `HF_TOKEN` if you're using a [gated model](https://huggingface.co/docs/hub/en/models-gated) and then pass it to the `sky launch` command:
```bash
export HF_TOKEN="xxxx"
sky launch -c dynamo nvidia-dynamo.sky.yaml --env MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct --env HF_TOKEN
```

## What's next

SkyServe support for Nvidia Dynamo is coming soon.

More resources:

* [AI on Kubernetes Without the Pain](https://blog.skypilot.co/ai-on-kubernetes/)
* [SkyPilot AI Gallery](https://docs.skypilot.co/en/latest/gallery/index.html)
* [SkyPilot Docs](https://docs.skypilot.co)
* [SkyPilot GitHub](https://github.com/skypilot-org/skypilot)

+ 60
- 0
examples/serve/nvidia-dynamo/nvidia-dynamo-multinode.sky.yaml View File

@@ -0,0 +1,60 @@
# Multi-node serving with NVIDIA Dynamo and SGLang in disaggregation mode.
#
# Usage:
#
# sky launch -c dynamo-multi nvidia-dynamo-multinode.sky.yaml
#
# This config uses 2 nodes with 8x H100 GPUs each for disaggregated serving.
# Optionally override the model:
#
# sky launch -c dynamo-multi nvidia-dynamo-multinode.sky.yaml --env MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct --env HF_TOKEN

resources:
accelerators: H100:8
ports: 8080

num_nodes: 2

envs:
MODEL_NAME: Qwen/Qwen3-8B
DIST_INIT_PORT: 29500
HF_TOKEN: "" # needed if a model is gated in HF Hub. Pass the value with `--env HF_TOKEN`

setup: |
sudo usermod -aG docker $USER
sudo chmod 666 /var/run/docker.sock
uv pip install "ai-dynamo[sglang]==0.5.0" accelerate --system --prerelease=allow
uv pip install "sglang[all]==0.5.2" --system --prerelease=allow
curl -fsSL -o docker-compose.yml https://raw.githubusercontent.com/ai-dynamo/dynamo/v0.5.0/deploy/docker-compose.yml
docker compose -f docker-compose.yml up -d

run: |
export GLOO_SOCKET_IFNAME=$(ip -o -4 route show to default | awk '{print $5}')
HEAD_IP=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
TOTAL_GPUS=$((SKYPILOT_NUM_NODES * SKYPILOT_NUM_GPUS_PER_NODE))

# For disaggregation mode, we need dp-size > 1
# Setting TP to half of total GPUs and DP to 2 for proper distribution
TP_SIZE=$((TOTAL_GPUS / 2))
DP_SIZE=2

if [ "${SKYPILOT_NODE_RANK}" == "0" ]; then
# Start frontend with KV-aware routing enabled
python -m dynamo.frontend --router-mode kv --http-port 8080 &
fi

python -m dynamo.sglang \
--model-path $MODEL_NAME \
--tp $TP_SIZE \
--dp-size $DP_SIZE \
--dist-init-addr $HEAD_IP:$DIST_INIT_PORT \
--nnodes ${SKYPILOT_NUM_NODES} \
--node-rank ${SKYPILOT_NODE_RANK} \
--host 0.0.0.0 \
--port 8081 \
--enable-dp-attention \
--trust-remote-code \
--mem-fraction-static 0.82 \
--disaggregation-transfer-backend nixl \
--disaggregation-bootstrap-port 30001 \
--page-size 16

+ 29
- 0
examples/serve/nvidia-dynamo/nvidia-dynamo.sky.yaml View File

@@ -0,0 +1,29 @@
# Single-node serving with NVIDIA Dynamo and SGLang.
#
# Usage:
#
# sky launch -c dynamo nvidia-dynamo.sky.yaml
#
# Optionally override the model:
#
# sky launch -c dynamo nvidia-dynamo.sky.yaml --env MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct --env HF_TOKEN

resources:
accelerators: H100:1
ports: 8080

envs:
MODEL_NAME: Qwen/Qwen3-8B
HF_TOKEN: "" # needed if a model is gated in HF Hub. Pass the value with `--env HF_TOKEN`

setup: |
sudo usermod -aG docker $USER
sudo chmod 666 /var/run/docker.sock

uv pip install "ai-dynamo[sglang]==0.4.1" accelerate --system --prerelease=allow
curl -fsSL -o docker-compose.yml https://raw.githubusercontent.com/ai-dynamo/dynamo/release/0.4.1/deploy/docker-compose.yml
docker compose -f docker-compose.yml up -d

run: |
python -m dynamo.frontend &
python -m dynamo.sglang --model $MODEL_NAME

+ 1
- 0
sky/server/plugins.py View File

@@ -164,6 +164,7 @@ def load_plugins(extension_context: ExtensionContext):

for plugin_config in config.get('plugins', []):
class_path = plugin_config['class']
logger.debug(f'Loading plugins: {class_path}')
module_path, class_name = class_path.rsplit('.', 1)
try:
module = importlib.import_module(module_path)


+ 5
- 0
sky/server/uvicorn.py View File

@@ -20,6 +20,7 @@ from uvicorn.supervisors import multiprocess
from sky import sky_logging
from sky.server import daemons
from sky.server import metrics as metrics_lib
from sky.server import plugins
from sky.server import state
from sky.server.requests import requests as requests_lib
from sky.skylet import constants
@@ -237,6 +238,10 @@ def run(config: uvicorn.Config, max_db_connections: Optional[int] = None):
server = Server(config=config, max_db_connections=max_db_connections)
try:
if config.workers is not None and config.workers > 1:
# When workers > 1, uvicorn does not run server app in the main
# process. In this case, plugins are not loaded at this point, so
# load plugins here without uvicorn app.
plugins.load_plugins(plugins.ExtensionContext())
sock = config.bind_socket()
SlowStartMultiprocess(config, target=server.run,
sockets=[sock]).run()


+ 42
- 33
sky/utils/kubernetes/generate_kubeconfig.sh View File

@@ -12,20 +12,20 @@
# * Specify SKYPILOT_NAMESPACE env var to override the default namespace where the service account is created.
# * Specify SKYPILOT_SA_NAME env var to override the default service account name.
# * Specify SKIP_SA_CREATION=1 to skip creating the service account and use an existing one
# * Specify SUPER_USER=1 to create a service account with cluster-admin permissions
# * Specify SUPER_USER=0 to create a service account with minimal permissions
#
# Usage:
# # Create "sky-sa" service account with minimal permissions in "default" namespace and generate kubeconfig
# # Create "sky-sa" service account in "default" namespace and generate kubeconfig
# $ ./generate_kubeconfig.sh
#
# # Create "my-sa" service account with minimal permissions in "my-namespace" namespace and generate kubeconfig
# # Create "my-sa" service account in "my-namespace" namespace and generate kubeconfig
# $ SKYPILOT_SA_NAME=my-sa SKYPILOT_NAMESPACE=my-namespace ./generate_kubeconfig.sh
#
# # Use an existing service account "my-sa" in "my-namespace" namespace and generate kubeconfig
# $ SKIP_SA_CREATION=1 SKYPILOT_SA_NAME=my-sa SKYPILOT_NAMESPACE=my-namespace ./generate_kubeconfig.sh
#
# # Create "sky-sa" service account with cluster-admin permissions in "default" namespace
# $ SUPER_USER=1 ./generate_kubeconfig.sh
# # Create "sky-sa" service account with minimal permissions in "default" namespace (manual setup may be required)
# $ SUPER_USER=0 ./generate_kubeconfig.sh

set -eu -o pipefail

@@ -33,11 +33,18 @@ set -eu -o pipefail
# use default.
SKYPILOT_SA=${SKYPILOT_SA_NAME:-sky-sa}
NAMESPACE=${SKYPILOT_NAMESPACE:-default}
SUPER_USER=${SUPER_USER:-0}
SUPER_USER=${SUPER_USER:-1}

echo "Service account: ${SKYPILOT_SA}"
echo "Namespace: ${NAMESPACE}"
echo "Super user permissions: ${SUPER_USER}"
echo "=========================================="
echo "SkyPilot Kubeconfig Generation"
echo "=========================================="
echo "Service Account: ${SKYPILOT_SA}"
echo "Namespace: ${NAMESPACE}"
if [ "${SUPER_USER}" != "1" ]; then
echo "Permissions: Minimal (manual setup may be required)"
SUPER_USER=0
fi
echo ""

# Set OS specific values.
if [[ "$OSTYPE" == "linux-gnu" ]]; then
@@ -53,7 +60,7 @@ fi

# If the user has set SKIP_SA_CREATION=1, skip creating the service account.
if [ -z ${SKIP_SA_CREATION+x} ]; then
echo "Creating the Kubernetes Service Account with ${SUPER_USER:+super user}${SUPER_USER:-minimal} RBAC permissions."
echo "[1/3] Creating Kubernetes Service Account and RBAC permissions..."
if [ "${SUPER_USER}" = "1" ]; then
# Create service account with cluster-admin permissions
kubectl apply -f - <<EOF
@@ -219,7 +226,8 @@ roleRef:
EOF
fi
# Apply optional ingress-related roles, but don't make the script fail if it fails
kubectl apply -f - <<EOF || echo "Failed to apply optional ingress-related roles. Nginx ingress is likely not installed. This is not critical and the script will continue."
echo " → Applying optional ingress permissions (skipped if ingress-nginx not installed)..."
kubectl apply -f - 2>/dev/null <<EOF || true
# Optional: Role for accessing ingress resources
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
@@ -253,8 +261,13 @@ roleRef:
name: ${SKYPILOT_SA}-role-ingress-nginx # Use the same name as the role at line 119
apiGroup: rbac.authorization.k8s.io
EOF
else
echo "[1/3] Skipping service account creation (using existing account)..."
fi

echo ""
echo "[2/3] Creating service account token..."

# Checks if secret entry was defined for Service account. If defined it means that Kubernetes server has a
# version bellow 1.24, otherwise one must manually create the secret and bind it to the Service account to have a non expiring token.
# After Kubernetes v1.24 Service accounts no longer generate automatic tokens/secrets.
@@ -293,7 +306,9 @@ CURRENT_CONTEXT=$(kubectl config current-context)
CURRENT_CLUSTER=$(kubectl config view -o jsonpath="{.contexts[?(@.name == \"${CURRENT_CONTEXT}\"})].context.cluster}")
CURRENT_CLUSTER_ADDR=$(kubectl config view -o jsonpath="{.clusters[?(@.name == \"${CURRENT_CLUSTER}\"})].cluster.server}")

echo "Writing kubeconfig."
echo ""
echo "[3/3] Generating kubeconfig file..."

cat > kubeconfig <<EOF
apiVersion: v1
clusters:
@@ -316,24 +331,18 @@ users:
token: ${SA_TOKEN}
EOF

echo "---
Done!

Kubeconfig using service account '${SKYPILOT_SA}' in namespace '${NAMESPACE}' written at $(pwd)/kubeconfig

Copy the generated kubeconfig file to your ~/.kube/ directory to use it with
kubectl and skypilot:

# Backup your existing kubeconfig file
mv ~/.kube/config ~/.kube/config.bak
cp kubeconfig ~/.kube/config

# Verify that you can access the cluster
kubectl get pods

Also add this to your ~/.sky/config.yaml to use the new service account:

# ~/.sky/config.yaml
kubernetes:
remote_identity: ${SKYPILOT_SA}
"
echo ""
echo "=========================================="
echo "✓ SUCCESS!"
echo "=========================================="
echo ""
echo "Kubeconfig file created successfully!"
echo ""
echo " Service Account: ${SKYPILOT_SA}"
echo " Namespace: ${NAMESPACE}"
echo " Location: $(pwd)/kubeconfig"
echo ""
echo "Next steps:"
echo " Refer to this page for setting up the credential for remote API server:"
echo " https://docs.skypilot.co/en/latest/reference/api-server/api-server-admin-deploy.html#optional-configure-cloud-accounts"
echo ""

+ 8
- 4
sky_templates/ray/start_cluster View File

@@ -77,14 +77,18 @@ if ! run_ray --version > /dev/null; then
fi
echo -e "${GREEN}Ray $(run_ray --version | cut -d' ' -f3) is installed.${NC}"

RAY_ADDRESS="127.0.0.1:${RAY_HEAD_PORT}"
LOCAL_RAY_ADDRESS="127.0.0.1:${RAY_HEAD_PORT}"
RAY_ADDRESS=${LOCAL_RAY_ADDRESS}
if [ "${SKYPILOT_NODE_RANK}" -ne 0 ]; then
HEAD_IP=$(echo "${SKYPILOT_NODE_IPS}" | head -n1)
RAY_ADDRESS="${HEAD_IP}:${RAY_HEAD_PORT}"
fi

# Check if user-space Ray is already running
if run_ray status --address="${RAY_ADDRESS}" &> /dev/null; then
# Check if user-space Ray is already running. Use local address to check, as
# if we use the head node address, the check will succeed even if the Ray
# cluster is started on the head node but not started on the current worker
# node.
if run_ray status --address="${LOCAL_RAY_ADDRESS}" &> /dev/null; then
echo -e "${YELLOW}Ray cluster is already running.${NC}"
run_ray status --address="${RAY_ADDRESS}"
exit 0
@@ -140,7 +144,7 @@ if [ "${SKYPILOT_NODE_RANK}" -eq 0 ]; then
echo -e "${RED}Error: Timeout waiting for nodes.${NC}" >&2
exit 1
fi
ready_nodes=$(run_ray list nodes --format=json | python3 -c "import sys, json; print(len(json.load(sys.stdin)))")
ready_nodes=$(run_ray list nodes --address="${RAY_ADDRESS}" --format=json | python3 -c "import sys, json; print(len(json.load(sys.stdin)))")
if [ "${ready_nodes}" -ge "${SKYPILOT_NUM_NODES}" ]; then
break
fi


+ 61
- 0
tests/unit_tests/test_sky/server/test_plugins.py View File

@@ -1,12 +1,15 @@
"""Unit tests for the SkyPilot API server plugins."""

import importlib
import sys
import types
from unittest import mock

from fastapi import FastAPI
import yaml

from sky.server import plugins
from sky.server import uvicorn as skyuvicorn


def test_load_plugins_registers_and_installs(monkeypatch, tmp_path):
@@ -50,3 +53,61 @@ def test_load_plugins_registers_and_installs(monkeypatch, tmp_path):
assert isinstance(plugin, DummyPlugin)
assert plugin.value == 42
assert installed['ctx'] is ctx


def test_server_import_loads_plugins(monkeypatch):
load_mock = mock.MagicMock()
monkeypatch.setattr(plugins, 'load_plugins', load_mock)

server_module = importlib.import_module('sky.server.server')
load_mock.reset_mock()

importlib.reload(server_module)

load_mock.assert_called_once()
ctx = load_mock.call_args.args[0]
assert isinstance(ctx, plugins.ExtensionContext)
assert ctx.app is server_module.app


def test_uvicorn_run_loads_plugins_for_multiple_workers(monkeypatch):
load_mock = mock.MagicMock()
monkeypatch.setattr(plugins, 'load_plugins', load_mock)

class DummyServer:

def __init__(self, config, max_db_connections=None):
del config, max_db_connections

def run(self, *args, **kwargs):
del args, kwargs

class DummyMultiprocess:

def __init__(self, config, target, sockets):
self.config = config
self.target = target
self.sockets = sockets
self.run_called = False

def run(self):
self.run_called = True

class DummyConfig:
reload = False
workers = 2
uds = None

def bind_socket(self):
return object()

monkeypatch.setattr(skyuvicorn, 'Server', DummyServer)
monkeypatch.setattr(skyuvicorn, 'SlowStartMultiprocess', DummyMultiprocess)

dummy_config = DummyConfig()
skyuvicorn.run(dummy_config)

load_mock.assert_called_once()
ctx = load_mock.call_args.args[0]
assert isinstance(ctx, plugins.ExtensionContext)
assert ctx.app is None

Loading…
Cancel
Save
Baidu
map