# CI/CD with clore-ai SDK

Integrate GPU testing and deployment into your CI/CD pipelines. This chapter covers GitHub Actions, GitLab CI, Docker, and secrets management — with full working configs.

***

## Secrets Management

Before configuring any pipeline, store your Clore API key securely.

### GitHub Actions

1. Go to your repo → **Settings → Secrets and variables → Actions**
2. Click **New repository secret**
3. Name: `CLORE_API_KEY`, Value: your API key

### GitLab CI

1. Go to your project → **Settings → CI/CD → Variables**
2. Add variable: Key = `CLORE_API_KEY`, Value = your API key
3. Check **Mask variable** and **Protect variable**

### General Rules

* **Never** hardcode API keys in source code or CI configs
* Use environment variables or secrets managers
* Rotate keys periodically
* Restrict key scope: use a dedicated API key for CI (not your main account key)

***

## GitHub Actions

### Basic: GPU Smoke Test

Run `nvidia-smi` on a Clore GPU on every push to `main`.

```yaml
# .github/workflows/gpu-test.yml
name: GPU Smoke Test

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  CLORE_API_KEY: ${{ secrets.CLORE_API_KEY }}

jobs:
  gpu-test:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install SDK
        run: pip install clore-ai

      - name: Run GPU test
        run: |
          python << 'EOF'
          import time
          from clore_ai import CloreAI
          from clore_ai.exceptions import CloreAPIError

          client = CloreAI()
          order_id = None

          try:
              # Find cheapest GPU
              servers = client.marketplace(max_price_usd=1.0)
              servers.sort(key=lambda s: s.price_usd or float("inf"))

              if not servers:
                  print("::warning::No GPU servers available")
                  exit(0)

              best = servers[0]
              print(f"Using server {best.id}: {best.gpu_model} @ ${best.price_usd:.4f}/h")

              # Create order
              order = client.create_order(
                  server_id=best.id,
                  image="cloreai/ubuntu22.04-cuda12",
                  type="on-demand",
                  currency="bitcoin",
                  ssh_password="CITest123",
                  ports={"22": "tcp"},
              )
              order_id = order.id
              print(f"Order {order_id} created")

              # Wait for instance (poll for IP)
              for _ in range(24):  # 2 minutes
                  time.sleep(5)
                  orders = client.my_orders()
                  active = next((o for o in orders if o.id == order_id), None)
                  if active and active.pub_cluster:
                      print(f"Instance ready: {active.pub_cluster}")
                      break
              else:
                  print("::error::Instance did not start in time")
                  exit(1)

              print("✅ GPU test passed")

          except CloreAPIError as e:
              print(f"::error::Clore API error: {e}")
              exit(1)

          finally:
              if order_id:
                  try:
                      client.cancel_order(order_id, issue="CI test complete")
                      print(f"Order {order_id} cancelled")
                  except Exception:
                      pass
          EOF
```

### Advanced: Matrix GPU Testing

Test your code on multiple GPU types in parallel.

```yaml
# .github/workflows/gpu-matrix.yml
name: GPU Matrix Test

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  CLORE_API_KEY: ${{ secrets.CLORE_API_KEY }}

jobs:
  gpu-test:
    runs-on: ubuntu-latest
    timeout-minutes: 20

    strategy:
      fail-fast: false
      matrix:
        gpu: ["RTX 4090", "RTX 3090", "A100"]
        max_price: [1.0, 1.5, 3.0]
        include:
          - gpu: "RTX 4090"
            max_price: 1.0
          - gpu: "RTX 3090"
            max_price: 1.5
          - gpu: "A100"
            max_price: 3.0

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          pip install clore-ai
          pip install -r requirements.txt

      - name: Run tests on ${{ matrix.gpu }}
        run: |
          python ci/run_gpu_test.py \
            --gpu "${{ matrix.gpu }}" \
            --max-price ${{ matrix.max_price }} \
            --script "pytest tests/gpu/ -v"
```

Supporting script `ci/run_gpu_test.py`:

```python
#!/usr/bin/env python3
"""Run a test script on a rented Clore GPU."""

import argparse
import subprocess
import sys
import time

from clore_ai import CloreAI
from clore_ai.exceptions import CloreAPIError


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--gpu", required=True)
    parser.add_argument("--max-price", type=float, default=1.0)
    parser.add_argument("--script", required=True)
    parser.add_argument("--image", default="cloreai/pytorch")
    parser.add_argument("--timeout", type=int, default=600)
    args = parser.parse_args()

    client = CloreAI()
    order_id = None

    try:
        # Find server
        servers = client.marketplace(gpu=args.gpu, max_price_usd=args.max_price)
        if not servers:
            print(f"::warning::No {args.gpu} servers available under ${args.max_price}")
            sys.exit(0)

        servers.sort(key=lambda s: s.price_usd or float("inf"))
        best = servers[0]
        print(f"Server {best.id}: {best.gpu_count}x {best.gpu_model} @ ${best.price_usd:.4f}/h")

        # Create order
        order = client.create_order(
            server_id=best.id,
            image=args.image,
            type="on-demand",
            currency="bitcoin",
            ssh_password="CIMatrix123",
            ports={"22": "tcp"},
        )
        order_id = order.id
        print(f"Order {order_id} created, waiting for SSH...")

        # Wait for SSH
        time.sleep(30)
        orders = client.my_orders()
        active = next((o for o in orders if o.id == order_id), None)

        if not active or not active.pub_cluster:
            print("::error::Instance did not start")
            sys.exit(1)

        host = active.pub_cluster
        port = 22
        if active.tcp_ports and "22" in active.tcp_ports:
            port = active.tcp_ports["22"]

        # Run the test script
        ssh_cmd = [
            "ssh", "-o", "StrictHostKeyChecking=no",
            "-p", str(port), f"root@{host}",
            args.script,
        ]
        result = subprocess.run(ssh_cmd, timeout=args.timeout)
        sys.exit(result.returncode)

    except CloreAPIError as e:
        print(f"::error::API error: {e}")
        sys.exit(1)
    finally:
        if order_id:
            try:
                client.cancel_order(order_id, issue="CI complete")
            except Exception:
                pass


if __name__ == "__main__":
    main()
```

***

## GitLab CI

### Basic Pipeline

```yaml
# .gitlab-ci.yml
stages:
  - gpu-test

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.pip-cache"

gpu-smoke-test:
  stage: gpu-test
  image: python:3.11-slim
  timeout: 15 minutes

  before_script:
    - pip install clore-ai

  script:
    - python ci/run_gpu_test.py --gpu "RTX 4090" --max-price 1.0 --script "nvidia-smi"

  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

  variables:
    CLORE_API_KEY: $CLORE_API_KEY
```

### Parallel GPU Jobs

```yaml
# .gitlab-ci.yml
stages:
  - gpu-test

.gpu-test-template: &gpu-test
  stage: gpu-test
  image: python:3.11-slim
  timeout: 20 minutes
  before_script:
    - pip install clore-ai
    - pip install -r requirements.txt
  variables:
    CLORE_API_KEY: $CLORE_API_KEY

gpu-test-4090:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "RTX 4090" --max-price 1.0 --script "pytest tests/gpu/"

gpu-test-3090:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "RTX 3090" --max-price 1.5 --script "pytest tests/gpu/"
  allow_failure: true

gpu-test-a100:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "A100" --max-price 3.0 --script "pytest tests/gpu/"
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
```

***

## Docker

### SDK Script Container

Package your SDK automation scripts in a Docker image.

```dockerfile
# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install SDK
RUN pip install --no-cache-dir clore-ai

# Install SSH client (for remote execution)
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client \
    && rm -rf /var/lib/apt/lists/*

# Copy your scripts
COPY scripts/ ./scripts/

# Default entrypoint
ENTRYPOINT ["python"]
CMD ["scripts/main.py"]
```

### Docker Compose for Local Development

```yaml
# docker-compose.yml
version: "3.8"

services:
  gpu-manager:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    volumes:
      - ./scripts:/app/scripts
      - ./results:/app/results
    command: python scripts/training_pipeline.py

  spot-bot:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    command: python scripts/spot_bidder.py
    restart: unless-stopped

  health-checker:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    command: python scripts/health_checker.py
    restart: unless-stopped
```

Run:

```bash
# Set your API key
echo "CLORE_API_KEY=your_key" > .env

# Start all services
docker compose up -d

# View logs
docker compose logs -f gpu-manager
```

### Multi-Stage Build for Production

```dockerfile
# Dockerfile.prod
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --target=/deps clore-ai -r requirements.txt

FROM python:3.11-slim
WORKDIR /app

# Copy only the installed packages
COPY --from=builder /deps /usr/local/lib/python3.11/site-packages/

# Install runtime deps only
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client \
    && rm -rf /var/lib/apt/lists/*

# Non-root user
RUN useradd -m appuser
USER appuser

COPY scripts/ ./scripts/

ENTRYPOINT ["python"]
```

***

## Cleanup & Safety

### Always Cancel Orders in CI

Every CI job must cancel its orders in a `finally` block or a post-job step:

```yaml
# GitHub Actions — post-run cleanup
- name: Cleanup GPU orders
  if: always()
  run: |
    python << 'EOF'
    from clore_ai import CloreAI
    from clore_ai.exceptions import CloreAPIError

    client = CloreAI()
    try:
        orders = client.my_orders()
        for o in orders:
            client.cancel_order(o.id, issue="CI cleanup")
            print(f"Cancelled order {o.id}")
    except CloreAPIError as e:
        print(f"Cleanup error: {e}")
    EOF
```

### Budget Guard for CI

Prevent runaway CI costs:

```python
# ci/budget_guard.py
"""Check budget before allowing GPU operations."""

from clore_ai import CloreAI

MAX_ACTIVE_ORDERS = 3
MAX_HOURLY_SPEND = 5.0  # USD


def check_budget() -> bool:
    client = CloreAI()
    orders = client.my_orders()

    if len(orders) >= MAX_ACTIVE_ORDERS:
        print(f"::error::Too many active orders ({len(orders)}/{MAX_ACTIVE_ORDERS})")
        return False

    # Estimate hourly spend
    total_hourly = sum(o.price or 0 for o in orders)
    if total_hourly >= MAX_HOURLY_SPEND:
        print(f"::error::Hourly spend too high (${total_hourly:.2f}/${MAX_HOURLY_SPEND:.2f})")
        return False

    print(f"✅ Budget OK: {len(orders)} orders, ${total_hourly:.2f}/h")
    return True


if __name__ == "__main__":
    import sys
    sys.exit(0 if check_budget() else 1)
```

Use it as a pre-step:

```yaml
- name: Budget check
  run: python ci/budget_guard.py
```

***

## See Also

* [SDK API Reference](/reference/python-sdk.md) — complete method documentation
* [SDK Quick Start](/getting-started/python-sdk-quickstart.md) — getting started tutorial
* [Automation Recipes](/advanced-use-cases/sdk-automation-recipes.md) — auto-scaler, spot bot, training pipeline
* [Auto-Provisioning from GitHub Actions](/devops-and-automation/github-actions.md) — existing GitHub Actions guide


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dev.clore.ai/devops-and-automation/cicd-clore-sdk.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
