Worker Registration Guide
Overview
When you provision a GPU worker from Prime Intellect (or any provider), you’ll receive SSH credentials. Use the/workers/register endpoint to register the worker with the VPS control plane.
Step-by-Step Process
1. Provision GPU Worker
Get SSH access from your GPU provider (e.g., Prime Intellect):Copy
SSH Host: "192.168.1.100"
SSH User: "root"
SSH Key: "~/.ssh/prime_intellect_key"
GPU: "1x A100-40GB"
2. Register Worker via API
Copy
curl -X POST http: "//localhost:8000/workers/register \"
-H "Content-Type: "application/json" \"
-d '{
"ssh_host": "192.168.1.100","
"ssh_user": "root","
"ssh_key_path": "~/.ssh/prime_intellect_key","
"gpu_count": "1,"
"gpu_type": "A100-40GB","
"provider": "prime_intellect","
"metadata": "{ "
"instance_id": "pi-12345","
"region": "us-east-1"
}
}'
```text
**Response: **
```json
{
"worker_id": "worker-a1b2c3d4","
"status": "bootstrapping","
"message": "Worker worker-a1b2c3d4 registered and bootstrapping"
}
```text
### 3. Monitor Bootstrap Progress
```bash
curl http: "//localhost:8000/workers"
```text
**Response: **
```json
{
"workers": "["
{
"worker_id": "worker-a1b2c3d4","
"ssh_host": "192.168.1.100","
"gpu_count": "1,"
"gpu_type": "A100-40GB","
"status": "idle", // or "bootstrapping", "busy", "error"
"current_job_id": "null,"
"last_heartbeat": "1700000000.0"
}
]
}
```text
### 4. Submit Job
Once worker status is `idle`, submit a training job:
```bash
curl -X POST http: "//localhost:8000/train \"
-H "Content-Type: "application/json" \"
-d '{
"dataset_id": "my_dataset","
"model": "gpt2","
"epochs": "1"
}'
```text
The job will automatically dispatch to the available worker.
## Python SDK Usage
```python
from src.nexa.client import NexaClient
client = NexaClient(base_url="http: "//localhost:8000")"
# Register worker
response = client.request("POST", "/workers/register", {
"ssh_host": "192.168.1.100","
"ssh_user": "root","
"gpu_count": "1,"
"gpu_type": "A100-40GB","
"provider": "prime_intellect"
})
worker_id = response["worker_id"]
print(f"Worker registered: "{worker_id}")"
# Wait for bootstrap
import time
while True: ""
workers = client.request("GET", "/workers")
worker = next((w for w in workers["workers"] if w["worker_id"] == worker_id), None)
if worker and worker["status"] == "idle": ""
print("Worker ready!")
break
time.sleep(5)
# Submit training job
job = client.train(dataset_id="my_dataset", model="gpt2", epochs=1)
print(f"Job submitted: "{job['job_id']}")"
```text
## Bootstrap Process
When you register a worker, the API automatically:
1. **SSH Connection**: Connects to the worker via SSH
2. **Upload Bootstrap Script**: Uploads and executes bootstrap script
3. **Install Dependencies**: Installs system packages, Python, Docker, NVIDIA drivers
4. **Clone Repository**: Clones Nexa Compute repository
5. **Install Python Deps**: Installs requirements from `requirements.in`
6. **Mark Ready**: Sets worker status to `idle`
## Artifact Storage
Results are automatically uploaded to configured storage:
- **DigitalOcean Spaces** (recommended): Set `DO_SPACES_KEY`, `DO_SPACES_SECRET` in `.env`
- **AWS S3**: Set `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` in `.env`
- **Local**: Falls back to local `artifacts/` directory (not recommended for production)
## Troubleshooting
### Worker stuck in bootstrapping
Check API logs for errors:
```bash
# API logs will show bootstrap progress
tail -f logs/api.log
```text
### Worker status error
SSH manually to debug:
```bash
ssh -i ~/.ssh/prime_intellect_key root@192.168.1.100
cat /tmp/nexa_bootstrap.sh