@@ -224,6 +224,30 @@ Following tabs describe how to configure credentials for different clouds on the
The specific cloud's credential for the exec-based authentication also needs to be configured. For example, to enable exec-based authentication for GKE, you also need to setup GCP credentials (see the GCP tab above).
.. dropdown:: Update Kubernetes credentials
After Kubernetes credentials are enabled, you can update the kubeconfig file in ``kube-credentials`` by:
1. Replace the existing secret in place:
.. code-block:: bash
kubectl delete secret kube-credentials
kubectl create secret generic kube-credentials \
--namespace $NAMESPACE \
--from-file=config=$HOME/.kube/config
2. Then it will take tens of seconds to take effect on the API server. You can verify the updated credentials in the API server pod:
.. code-block:: bash
# The NAMESPACE and RELEASE_NAME should be consistent with the API server deployment
API_SERVER_POD_NAME=$(kubectl get pods -n $NAMESPACE -l app=${RELEASE_NAME}-api -o jsonpath='{.items[0].metadata.name}')
To use multiple Kubernetes clusters, you will need to add the context names to ``allowed_contexts`` in the SkyPilot config. An example config file that allows using the hosting Kubernetes cluster and two additional Kubernetes clusters is shown below:
.. code-block:: yaml
@@ -267,6 +291,52 @@ Following tabs describe how to configure credentials for different clouds on the
If your ``ssh_node_pools.yaml`` requires SSH keys, create a secret that contains the keys and set the :ref:`apiService.sshKeySecret <helm-values-apiService-sshKeySecret>` to the secret name:
.. code-block:: bash
@@ -551,6 +849,28 @@ Following tabs describe how to configure credentials for different clouds on the
--reuse-values \
--set apiService.sshKeySecret=$SECRET_NAME
.. dropdown:: Update SSH key credentials
After SSH key credentials are enabled, you can update the credentials file in ``$SECRET_NAME`` by:
1. Replace the existing secret in place:
.. code-block:: bash
kubectl delete secret $SECRET_NAME
kubectl create secret generic $SECRET_NAME \
--namespace $NAMESPACE \
--from-file=id_rsa=/path/to/id_rsa \
--from-file=other_id_rsa=/path/to/other_id_rsa
2. Then it will take tens of seconds to take effect on the API server. You can verify the updated credentials in the API server pod:
.. code-block:: bash
# The NAMESPACE and RELEASE_NAME should be consistent with the API server deployment
API_SERVER_POD_NAME=$(kubectl get pods -n $NAMESPACE -l app=${RELEASE_NAME}-api -o jsonpath='{.items[0].metadata.name}')
kubectl exec $API_SERVER_POD_NAME -n $NAMESPACE -- ls -lart /root/.ssh/
After the API server is deployed, use the ``sky ssh up`` command to set up the SSH Node Pools. Refer to :ref:`existing-machines` for more details.
.. note::
@@ -582,6 +902,53 @@ Following tabs describe how to configure credentials for different clouds on the
--set r2Credentials.enabled=true \
--set r2Credentials.r2SecretName=r2-credentials
.. dropdown:: Update Cloudflare R2 credentials
After Cloudflare R2 credentials are enabled, you can update the credentials file in ``r2-credentials`` using either approach:
On July 23, 2024, Meta released the [Llama 3.1 model family](https://ai.meta.com/blog/meta-llama-3-1/), including a 405B parameter model in both base model and instruction-tuned forms. Llama 3.1 405B became _the first open LLM that closely rivals top proprietary models_ like GPT-4o and Claude 3.5 Sonnet.
This guide shows how to use [SkyPilot](https://github.com/skypilot-org/skypilot) and [torchtune](https://pytorch.org/torchtune/stable/index.html) to **finetune Llama 3.1 on your own data and infra**. Everything is packaged in a simple [SkyPilot YAML](https://docs.skypilot.co/en/latest/getting-started/quickstart.html), that can be launched with one command on your infra:
This guide shows how to use [SkyPilot](https://github.com/skypilot-org/skypilot) and [torchtune](https://meta-pytorch.org/torchtune/stable/index.html) to **finetune Llama 3.1 on your own data and infra**. Everything is packaged in a simple [SkyPilot YAML](https://docs.skypilot.co/en/latest/getting-started/quickstart.html), that can be launched with one command on your infra:
@@ -20,7 +20,7 @@ This guide shows how to use [SkyPilot](https://github.com/skypilot-org/skypilot)
## Let's finetune Llama 3.1
We will use [torchtune](https://pytorch.org/torchtune/stable/index.html) to finetune Llama 3.1. The example below uses the [`yahma/alpaca-cleaned`](https://huggingface.co/datasets/yahma/alpaca-cleaned) dataset, which you can replace with your own dataset later.
We will use [torchtune](https://meta-pytorch.org/torchtune/stable/index.html) to finetune Llama 3.1. The example below uses the [`yahma/alpaca-cleaned`](https://huggingface.co/datasets/yahma/alpaca-cleaned) dataset, which you can replace with your own dataset later.
To set up the environment for launching the finetuning job, finish the [Appendix: Preparation](#appendix-preparation) section first.
# Increment the following for catching performance bugs easier:
# current num items (num SSH connections): 1
setup_commands:
# Disable `unattended-upgrades` to prevent apt-get from hanging. It should be called at the beginning before the process started to avoid being blocked. (This is a temporary fix.)
# Create ~/.ssh/config file in case the file does not exist in the image.
# Line 'rm ..': there is another installation of pip.
# Line 'sudo bash ..': set the ulimit as suggested by ray docs for performance. https://docs.ray.io/en/latest/cluster/vms/user-guides/large-cluster-best-practices.html#system-configuration
# Line 'sudo grep ..': set the number of threads per process to unlimited to avoid ray job submit stucking issue when the number of running ray jobs increase.
# Line 'mkdir -p ..': disable host key check
- {%- for initial_setup_command in initial_setup_commands %}
@@ -68,8 +68,10 @@ echo "Deploying SkyPilot API server..."
if [ "$HELM_VERSION" = "latest" ]; then
extra_flag="--devel"
else
# Convert PEP440 version to SemVer if needed (e.g., 1.0.0.dev20250609 -> 1.0.0-dev.20250609)
SEMVER_VERSION=$(echo "$HELM_VERSION" | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)\.dev([0-9]+)/\1-dev.\2/')
# Convert PEP440 version to SemVer if needed
# 0.11.0rc1 -> 0.11.0-rc.1
# 1.0.0.dev20250609 -> 1.0.0-dev.20250609
SEMVER_VERSION=$(echo "$HELM_VERSION" | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)rc([0-9]+)/\1-rc.\2/' | sed -E 's/([0-9]+\.[0-9]+\.[0-9]+)\.dev([0-9]+)/\1-dev.\2/')
f's=$(SKYPILOT_DEBUG=0 sky api info | tee /dev/stderr) && echo "\n===Validating endpoint output===" && echo "$s" | grep "Endpoint set to default local API server."',
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.