# Multi-Cloud GPU Orchestrator

## What We're Building

A unified GPU orchestration layer that spans Clore.ai and other cloud providers, automatically selecting the most cost-effective option for your workloads. Route jobs to the cheapest available GPUs across multiple providers with a single API.

**Key Features:**

* Unified API across Clore.ai, AWS, GCP, Azure, and Lambda Labs
* Automatic cost optimization and provider selection
* Failover and redundancy across providers
* Consistent job submission interface
* Real-time price comparison
* Workload-aware scheduling
* Provider health monitoring

## Prerequisites

* Accounts on desired cloud providers
* Python 3.10+

```bash
pip install requests boto3 google-cloud-compute azure-mgmt-compute
```

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Cloud Orchestrator                      │
├─────────────────────────────────────────────────────────────────┤
│                     Unified API Layer                            │
│    submit_job() | get_status() | cancel_job() | get_prices()    │
├─────────────────────────────────────────────────────────────────┤
│                    Cost Optimizer                                │
│         Compare prices → Select provider → Route job            │
├────────────┬────────────┬────────────┬────────────┬─────────────┤
│  Clore.ai  │    AWS     │    GCP     │   Azure    │ Lambda Labs │
│  Provider  │  Provider  │  Provider  │  Provider  │   Provider  │
└────────────┴────────────┴────────────┴────────────┴─────────────┘
        │            │            │            │            │
        ▼            ▼            ▼            ▼            ▼
   ┌─────────────────────────────────────────────────────────┐
   │                      GPU Resources                       │
   │   RTX 4090 | A100 | V100 | H100 | RTX 3090 | A6000      │
   └─────────────────────────────────────────────────────────┘
```

## Step 1: Provider Abstraction Layer

```python
# providers/base.py
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Optional, Dict, Any
from enum import Enum

class GPUType(Enum):
    RTX_4090 = "rtx_4090"
    RTX_4080 = "rtx_4080"
    RTX_3090 = "rtx_3090"
    RTX_3080 = "rtx_3080"
    A100_80GB = "a100_80gb"
    A100_40GB = "a100_40gb"
    A6000 = "a6000"
    V100 = "v100"
    H100 = "h100"

@dataclass
class GPUInstance:
    """Represents a GPU instance from any provider."""
    provider: str
    instance_id: str
    gpu_type: GPUType
    gpu_count: int
    price_per_hour: float
    region: str
    status: str
    is_spot: bool = False
    ssh_host: Optional[str] = None
    ssh_port: int = 22
    ssh_user: str = "root"
    metadata: Dict[str, Any] = None

@dataclass
class ProviderCapacity:
    """Available capacity at a provider."""
    provider: str
    gpu_type: GPUType
    available_count: int
    price_per_hour: float
    is_spot: bool
    region: str

@dataclass
class JobRequest:
    """Request to launch a GPU job."""
    gpu_type: GPUType
    gpu_count: int = 1
    image: str = "nvidia/cuda:12.8.0-base-ubuntu22.04"
    command: Optional[str] = None
    env: Dict[str, str] = None
    max_price_per_hour: Optional[float] = None
    prefer_spot: bool = True
    min_runtime_hours: float = 1.0
    preferred_regions: List[str] = None
    
    def __post_init__(self):
        self.env = self.env or {}
        self.preferred_regions = self.preferred_regions or []

@dataclass
class JobResult:
    """Result of launching a job."""
    success: bool
    job_id: str
    provider: str
    instance: Optional[GPUInstance] = None
    error: Optional[str] = None


class CloudProvider(ABC):
    """Abstract base class for cloud providers."""
    
    @property
    @abstractmethod
    def name(self) -> str:
        """Provider name."""
        pass
    
    @abstractmethod
    def get_available_gpus(self) -> List[ProviderCapacity]:
        """Get available GPU capacity."""
        pass
    
    @abstractmethod
    def launch_instance(self, request: JobRequest) -> JobResult:
        """Launch a GPU instance."""
        pass
    
    @abstractmethod
    def terminate_instance(self, instance_id: str) -> bool:
        """Terminate an instance."""
        pass
    
    @abstractmethod
    def get_instance_status(self, instance_id: str) -> Optional[GPUInstance]:
        """Get instance status."""
        pass
    
    @abstractmethod
    def list_instances(self) -> List[GPUInstance]:
        """List all running instances."""
        pass
    
    def is_healthy(self) -> bool:
        """Check if provider API is healthy."""
        try:
            self.get_available_gpus()
            return True
        except Exception:
            return False
```

## Step 2: Clore.ai Provider

```python
# providers/clore.py
import requests
import time
import secrets
from typing import List, Optional
from .base import (
    CloudProvider, GPUInstance, ProviderCapacity, 
    JobRequest, JobResult, GPUType
)

class CloreProvider(CloudProvider):
    """Clore.ai provider implementation."""
    
    BASE_URL = "https://api.clore.ai"
    
    # GPU type mapping
    GPU_MAP = {
        "RTX 4090": GPUType.RTX_4090,
        "RTX 4080": GPUType.RTX_4080,
        "RTX 3090": GPUType.RTX_3090,
        "RTX 3080": GPUType.RTX_3080,
        "A100": GPUType.A100_40GB,
        "A100-80GB": GPUType.A100_80GB,
        "A6000": GPUType.A6000,
    }
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.headers = {"auth": api_key}
        self._last_request = 0
    
    @property
    def name(self) -> str:
        return "clore"
    
    def _request(self, method: str, endpoint: str, **kwargs):
        """Make rate-limited API request."""
        now = time.time()
        if now - self._last_request < 1:
            time.sleep(1 - (now - self._last_request))
        self._last_request = time.time()
        
        response = requests.request(
            method,
            f"{self.BASE_URL}{endpoint}",
            headers=self.headers,
            timeout=30,
            **kwargs
        )
        data = response.json()
        
        if data.get("code") != 0:
            raise Exception(f"Clore API Error: {data}")
        
        return data
    
    def _normalize_gpu(self, gpu_name: str) -> Optional[GPUType]:
        """Normalize GPU name to GPUType."""
        for pattern, gpu_type in self.GPU_MAP.items():
            if pattern.lower() in gpu_name.lower():
                return gpu_type
        return None
    
    def get_available_gpus(self) -> List[ProviderCapacity]:
        """Get available GPU capacity from Clore.ai marketplace."""
        data = self._request("GET", "/v1/marketplace")
        
        capacity = {}
        for server in data.get("servers", []):
            if server.get("rented"):
                continue
            
            gpu_array = server.get("gpu_array", [])
            if not gpu_array:
                continue
            
            gpu_type = self._normalize_gpu(gpu_array[0])
            if not gpu_type:
                continue
            
            spot_price = server.get("price", {}).get("usd", {}).get("spot")
            if not spot_price:
                continue
            
            key = (gpu_type, True)  # Clore is effectively all spot
            if key not in capacity:
                capacity[key] = ProviderCapacity(
                    provider="clore",
                    gpu_type=gpu_type,
                    available_count=0,
                    price_per_hour=float('inf'),
                    is_spot=True,
                    region="global"
                )
            
            capacity[key].available_count += len(gpu_array)
            capacity[key].price_per_hour = min(
                capacity[key].price_per_hour,
                spot_price
            )
        
        return list(capacity.values())
    
    def launch_instance(self, request: JobRequest) -> JobResult:
        """Launch a Clore.ai instance."""
        
        # Find matching server
        data = self._request("GET", "/v1/marketplace")
        
        matching_servers = []
        for server in data.get("servers", []):
            if server.get("rented"):
                continue
            
            gpu_array = server.get("gpu_array", [])
            if not gpu_array:
                continue
            
            gpu_type = self._normalize_gpu(gpu_array[0])
            if gpu_type != request.gpu_type:
                continue
            
            if len(gpu_array) < request.gpu_count:
                continue
            
            spot_price = server.get("price", {}).get("usd", {}).get("spot")
            if request.max_price_per_hour and spot_price > request.max_price_per_hour:
                continue
            
            matching_servers.append({
                "id": server["id"],
                "price": spot_price,
                "gpus": gpu_array
            })
        
        if not matching_servers:
            return JobResult(
                success=False,
                job_id="",
                provider="clore",
                error=f"No {request.gpu_type.value} available"
            )
        
        # Select cheapest
        server = min(matching_servers, key=lambda x: x["price"])
        
        # Create order
        ssh_password = secrets.token_urlsafe(16)
        
        order_data = {
            "renting_server": server["id"],
            "type": "spot",
            "currency": "CLORE-Blockchain",
            "image": request.image,
            "ports": {"22": "tcp"},
            "env": {**request.env, "NVIDIA_VISIBLE_DEVICES": "all"},
            "ssh_password": ssh_password,
            "spotprice": server["price"] * 1.1
        }
        
        result = self._request("POST", "/v1/create_order", json=order_data)
        order_id = result["order_id"]
        
        # Wait for ready
        for _ in range(90):
            orders = self._request("GET", "/v1/my_orders").get("orders", [])
            order = next((o for o in orders if o["order_id"] == order_id), None)
            
            if order and order.get("status") == "running":
                conn = order.get("connection", {})
                ssh_str = conn.get("ssh", "")
                
                # Parse SSH connection
                ssh_host = ssh_str.split("@")[1].split()[0] if "@" in ssh_str else ""
                ssh_port = 22
                if "-p" in ssh_str:
                    ssh_port = int(ssh_str.split("-p")[1].strip())
                
                instance = GPUInstance(
                    provider="clore",
                    instance_id=str(order_id),
                    gpu_type=request.gpu_type,
                    gpu_count=len(server["gpus"]),
                    price_per_hour=server["price"],
                    region="global",
                    status="running",
                    is_spot=True,
                    ssh_host=ssh_host,
                    ssh_port=ssh_port,
                    ssh_user="root",
                    metadata={"ssh_password": ssh_password}
                )
                
                return JobResult(
                    success=True,
                    job_id=str(order_id),
                    provider="clore",
                    instance=instance
                )
            
            time.sleep(2)
        
        return JobResult(
            success=False,
            job_id=str(order_id),
            provider="clore",
            error="Timeout waiting for instance"
        )
    
    def terminate_instance(self, instance_id: str) -> bool:
        """Terminate a Clore.ai order."""
        try:
            self._request("POST", "/v1/cancel_order", json={"id": int(instance_id)})
            return True
        except Exception:
            return False
    
    def get_instance_status(self, instance_id: str) -> Optional[GPUInstance]:
        """Get status of a Clore.ai order."""
        orders = self._request("GET", "/v1/my_orders").get("orders", [])
        order = next((o for o in orders if o["order_id"] == int(instance_id)), None)
        
        if not order:
            return None
        
        return GPUInstance(
            provider="clore",
            instance_id=instance_id,
            gpu_type=GPUType.RTX_4090,  # Would need to track this
            gpu_count=1,
            price_per_hour=order.get("price", 0) * 60,
            region="global",
            status=order.get("status", "unknown"),
            is_spot=True
        )
    
    def list_instances(self) -> List[GPUInstance]:
        """List all Clore.ai orders."""
        orders = self._request("GET", "/v1/my_orders").get("orders", [])
        
        return [
            GPUInstance(
                provider="clore",
                instance_id=str(o["order_id"]),
                gpu_type=GPUType.RTX_4090,
                gpu_count=1,
                price_per_hour=o.get("price", 0) * 60,
                region="global",
                status=o.get("status", "unknown"),
                is_spot=True
            )
            for o in orders
            if o.get("status") in ("running", "creating_order")
        ]
```

## Step 3: AWS Provider (Example)

```python
# providers/aws.py
import boto3
from typing import List, Optional
from .base import (
    CloudProvider, GPUInstance, ProviderCapacity,
    JobRequest, JobResult, GPUType
)

class AWSProvider(CloudProvider):
    """AWS EC2 provider for GPU instances."""
    
    # AWS instance type mapping
    GPU_INSTANCES = {
        GPUType.A100_40GB: "p4d.24xlarge",
        GPUType.V100: "p3.2xlarge",
        GPUType.RTX_4090: None,  # Not available on AWS
    }
    
    INSTANCE_PRICES = {
        "p4d.24xlarge": 32.77,
        "p3.2xlarge": 3.06,
        "p3.8xlarge": 12.24,
        "g5.xlarge": 1.006,
        "g5.2xlarge": 1.212,
    }
    
    def __init__(self, region: str = "us-east-1"):
        self.region = region
        self.ec2 = boto3.client("ec2", region_name=region)
    
    @property
    def name(self) -> str:
        return "aws"
    
    def get_available_gpus(self) -> List[ProviderCapacity]:
        """Get AWS GPU capacity (simplified)."""
        # In reality, would query spot prices and capacity
        return [
            ProviderCapacity(
                provider="aws",
                gpu_type=GPUType.A100_40GB,
                available_count=100,  # AWS has plenty
                price_per_hour=32.77,
                is_spot=False,
                region=self.region
            ),
            ProviderCapacity(
                provider="aws",
                gpu_type=GPUType.V100,
                available_count=100,
                price_per_hour=3.06,
                is_spot=False,
                region=self.region
            )
        ]
    
    def launch_instance(self, request: JobRequest) -> JobResult:
        """Launch AWS EC2 GPU instance."""
        instance_type = self.GPU_INSTANCES.get(request.gpu_type)
        
        if not instance_type:
            return JobResult(
                success=False,
                job_id="",
                provider="aws",
                error=f"{request.gpu_type.value} not available on AWS"
            )
        
        # Launch instance (simplified)
        try:
            response = self.ec2.run_instances(
                ImageId="ami-0abcdef1234567890",  # GPU AMI
                InstanceType=instance_type,
                MinCount=1,
                MaxCount=1,
                # ... more config
            )
            
            instance_id = response["Instances"][0]["InstanceId"]
            
            return JobResult(
                success=True,
                job_id=instance_id,
                provider="aws",
                instance=GPUInstance(
                    provider="aws",
                    instance_id=instance_id,
                    gpu_type=request.gpu_type,
                    gpu_count=request.gpu_count,
                    price_per_hour=self.INSTANCE_PRICES.get(instance_type, 0),
                    region=self.region,
                    status="pending",
                    is_spot=False
                )
            )
        except Exception as e:
            return JobResult(
                success=False,
                job_id="",
                provider="aws",
                error=str(e)
            )
    
    def terminate_instance(self, instance_id: str) -> bool:
        try:
            self.ec2.terminate_instances(InstanceIds=[instance_id])
            return True
        except Exception:
            return False
    
    def get_instance_status(self, instance_id: str) -> Optional[GPUInstance]:
        try:
            response = self.ec2.describe_instances(InstanceIds=[instance_id])
            instance = response["Reservations"][0]["Instances"][0]
            
            return GPUInstance(
                provider="aws",
                instance_id=instance_id,
                gpu_type=GPUType.V100,  # Would need to determine
                gpu_count=1,
                price_per_hour=3.06,
                region=self.region,
                status=instance["State"]["Name"],
                is_spot=False,
                ssh_host=instance.get("PublicIpAddress")
            )
        except Exception:
            return None
    
    def list_instances(self) -> List[GPUInstance]:
        # List all GPU instances
        return []
```

## Step 4: Multi-Cloud Orchestrator

```python
# orchestrator.py
from typing import List, Dict, Optional
from dataclasses import dataclass
import logging

from providers.base import (
    CloudProvider, GPUInstance, ProviderCapacity,
    JobRequest, JobResult, GPUType
)
from providers.clore import CloreProvider
from providers.aws import AWSProvider

logger = logging.getLogger(__name__)


@dataclass
class PriceComparison:
    """Price comparison across providers."""
    gpu_type: GPUType
    providers: List[Dict]  # [{provider, price, available, is_spot}]
    cheapest_provider: str
    cheapest_price: float


class MultiCloudOrchestrator:
    """Orchestrate GPU jobs across multiple cloud providers."""
    
    def __init__(self):
        self.providers: Dict[str, CloudProvider] = {}
        self._active_jobs: Dict[str, str] = {}  # job_id -> provider
    
    def add_provider(self, provider: CloudProvider):
        """Add a cloud provider."""
        self.providers[provider.name] = provider
        logger.info(f"Added provider: {provider.name}")
    
    def remove_provider(self, name: str):
        """Remove a cloud provider."""
        if name in self.providers:
            del self.providers[name]
    
    def get_all_capacity(self) -> List[ProviderCapacity]:
        """Get capacity from all providers."""
        all_capacity = []
        
        for name, provider in self.providers.items():
            try:
                capacity = provider.get_available_gpus()
                all_capacity.extend(capacity)
            except Exception as e:
                logger.warning(f"Failed to get capacity from {name}: {e}")
        
        return all_capacity
    
    def compare_prices(self, gpu_type: GPUType) -> PriceComparison:
        """Compare prices across all providers for a GPU type."""
        capacity = self.get_all_capacity()
        
        matching = [c for c in capacity if c.gpu_type == gpu_type]
        
        providers = []
        for c in matching:
            providers.append({
                "provider": c.provider,
                "price": c.price_per_hour,
                "available": c.available_count,
                "is_spot": c.is_spot,
                "region": c.region
            })
        
        # Sort by price
        providers.sort(key=lambda x: x["price"])
        
        return PriceComparison(
            gpu_type=gpu_type,
            providers=providers,
            cheapest_provider=providers[0]["provider"] if providers else "",
            cheapest_price=providers[0]["price"] if providers else float('inf')
        )
    
    def select_provider(self, request: JobRequest) -> Optional[str]:
        """Select best provider for a job request."""
        comparison = self.compare_prices(request.gpu_type)
        
        for p in comparison.providers:
            # Check price limit
            if request.max_price_per_hour and p["price"] > request.max_price_per_hour:
                continue
            
            # Check spot preference
            if request.prefer_spot and not p["is_spot"]:
                # Still consider if much cheaper
                if p["price"] > comparison.cheapest_price * 1.5:
                    continue
            
            # Check availability
            if p["available"] < request.gpu_count:
                continue
            
            return p["provider"]
        
        return None
    
    def submit_job(self, request: JobRequest, provider: str = None) -> JobResult:
        """Submit a job, optionally to a specific provider."""
        
        # Select provider if not specified
        if not provider:
            provider = self.select_provider(request)
        
        if not provider:
            return JobResult(
                success=False,
                job_id="",
                provider="",
                error=f"No provider available for {request.gpu_type.value}"
            )
        
        if provider not in self.providers:
            return JobResult(
                success=False,
                job_id="",
                provider=provider,
                error=f"Provider {provider} not configured"
            )
        
        logger.info(f"Submitting job to {provider}: {request.gpu_type.value}")
        
        # Launch on selected provider
        result = self.providers[provider].launch_instance(request)
        
        if result.success:
            self._active_jobs[result.job_id] = provider
        
        return result
    
    def submit_with_failover(self, request: JobRequest) -> JobResult:
        """Submit job with automatic failover to other providers."""
        
        comparison = self.compare_prices(request.gpu_type)
        
        for p in comparison.providers:
            if request.max_price_per_hour and p["price"] > request.max_price_per_hour:
                continue
            
            provider_name = p["provider"]
            
            if provider_name not in self.providers:
                continue
            
            logger.info(f"Trying provider {provider_name}...")
            result = self.providers[provider_name].launch_instance(request)
            
            if result.success:
                self._active_jobs[result.job_id] = provider_name
                return result
            
            logger.warning(f"Provider {provider_name} failed: {result.error}")
        
        return JobResult(
            success=False,
            job_id="",
            provider="",
            error="All providers failed"
        )
    
    def terminate_job(self, job_id: str) -> bool:
        """Terminate a job."""
        provider_name = self._active_jobs.get(job_id)
        
        if not provider_name:
            logger.warning(f"Unknown job: {job_id}")
            return False
        
        provider = self.providers.get(provider_name)
        if not provider:
            return False
        
        success = provider.terminate_instance(job_id)
        
        if success:
            del self._active_jobs[job_id]
        
        return success
    
    def get_job_status(self, job_id: str) -> Optional[GPUInstance]:
        """Get status of a job."""
        provider_name = self._active_jobs.get(job_id)
        
        if not provider_name:
            return None
        
        provider = self.providers.get(provider_name)
        if not provider:
            return None
        
        return provider.get_instance_status(job_id)
    
    def list_all_jobs(self) -> List[GPUInstance]:
        """List all jobs across all providers."""
        all_jobs = []
        
        for name, provider in self.providers.items():
            try:
                jobs = provider.list_instances()
                all_jobs.extend(jobs)
            except Exception as e:
                logger.warning(f"Failed to list jobs from {name}: {e}")
        
        return all_jobs
    
    def terminate_all_jobs(self) -> int:
        """Terminate all active jobs."""
        count = 0
        
        for job_id in list(self._active_jobs.keys()):
            if self.terminate_job(job_id):
                count += 1
        
        return count
    
    def get_total_cost_per_hour(self) -> float:
        """Get total cost per hour of all active jobs."""
        jobs = self.list_all_jobs()
        return sum(j.price_per_hour for j in jobs if j.status == "running")
```

## Step 5: Complete Multi-Cloud Script

```python
#!/usr/bin/env python3
"""
Multi-Cloud GPU Orchestrator

Usage:
    python multi_cloud.py --action compare --gpu RTX_4090
    python multi_cloud.py --action submit --gpu A100_40GB --max-price 2.0
    python multi_cloud.py --action list
    python multi_cloud.py --action terminate --job-id abc123
"""

import argparse
import json
from orchestrator import MultiCloudOrchestrator, GPUType, JobRequest
from providers.clore import CloreProvider
from providers.aws import AWSProvider


def setup_orchestrator(clore_key: str = None) -> MultiCloudOrchestrator:
    """Set up orchestrator with all providers."""
    orch = MultiCloudOrchestrator()
    
    # Add Clore.ai
    if clore_key:
        orch.add_provider(CloreProvider(clore_key))
    
    # Add AWS (if credentials configured)
    try:
        orch.add_provider(AWSProvider())
    except Exception:
        pass
    
    return orch


def main():
    parser = argparse.ArgumentParser(description="Multi-Cloud GPU Orchestrator")
    parser.add_argument("--action", required=True, 
                       choices=["compare", "submit", "list", "terminate", "status"])
    parser.add_argument("--gpu", help="GPU type (e.g., RTX_4090, A100_40GB)")
    parser.add_argument("--max-price", type=float, help="Max price per hour")
    parser.add_argument("--provider", help="Specific provider to use")
    parser.add_argument("--job-id", help="Job ID for status/terminate")
    parser.add_argument("--clore-key", help="Clore.ai API key")
    parser.add_argument("--image", default="nvidia/cuda:12.8.0-base-ubuntu22.04")
    args = parser.parse_args()
    
    orch = setup_orchestrator(args.clore_key)
    
    if not orch.providers:
        print("❌ No providers configured!")
        return
    
    print(f"✅ Providers: {list(orch.providers.keys())}")
    print()
    
    if args.action == "compare":
        if not args.gpu:
            print("--gpu required for compare")
            return
        
        gpu_type = GPUType[args.gpu.upper()]
        comparison = orch.compare_prices(gpu_type)
        
        print(f"💰 Price Comparison: {gpu_type.value}")
        print("-" * 60)
        
        for p in comparison.providers:
            spot_label = "🟢 Spot" if p["is_spot"] else "🔵 On-Demand"
            print(f"  {p['provider']:10} ${p['price']:.3f}/hr  {p['available']:3} avail  {spot_label}")
        
        print("-" * 60)
        print(f"🏆 Cheapest: {comparison.cheapest_provider} @ ${comparison.cheapest_price:.3f}/hr")
    
    elif args.action == "submit":
        if not args.gpu:
            print("--gpu required for submit")
            return
        
        gpu_type = GPUType[args.gpu.upper()]
        
        request = JobRequest(
            gpu_type=gpu_type,
            gpu_count=1,
            image=args.image,
            max_price_per_hour=args.max_price,
            prefer_spot=True
        )
        
        if args.provider:
            result = orch.submit_job(request, provider=args.provider)
        else:
            result = orch.submit_with_failover(request)
        
        if result.success:
            print(f"✅ Job submitted successfully!")
            print(f"   Job ID: {result.job_id}")
            print(f"   Provider: {result.provider}")
            if result.instance:
                print(f"   SSH: {result.instance.ssh_user}@{result.instance.ssh_host}:{result.instance.ssh_port}")
                print(f"   Price: ${result.instance.price_per_hour:.3f}/hr")
        else:
            print(f"❌ Job failed: {result.error}")
    
    elif args.action == "list":
        jobs = orch.list_all_jobs()
        
        if not jobs:
            print("No active jobs")
            return
        
        print(f"📋 Active Jobs ({len(jobs)})")
        print("-" * 70)
        
        total_cost = 0
        for job in jobs:
            print(f"  {job.instance_id:15} {job.provider:8} {job.gpu_type.value:12} "
                  f"${job.price_per_hour:.3f}/hr  {job.status}")
            if job.status == "running":
                total_cost += job.price_per_hour
        
        print("-" * 70)
        print(f"💵 Total: ${total_cost:.2f}/hr")
    
    elif args.action == "status":
        if not args.job_id:
            print("--job-id required for status")
            return
        
        instance = orch.get_job_status(args.job_id)
        
        if instance:
            print(f"📊 Job Status: {args.job_id}")
            print(f"   Provider: {instance.provider}")
            print(f"   GPU: {instance.gpu_type.value}")
            print(f"   Status: {instance.status}")
            print(f"   Price: ${instance.price_per_hour:.3f}/hr")
            if instance.ssh_host:
                print(f"   SSH: {instance.ssh_user}@{instance.ssh_host}:{instance.ssh_port}")
        else:
            print(f"❌ Job not found: {args.job_id}")
    
    elif args.action == "terminate":
        if not args.job_id:
            # Terminate all
            count = orch.terminate_all_jobs()
            print(f"🛑 Terminated {count} jobs")
        else:
            success = orch.terminate_job(args.job_id)
            if success:
                print(f"✅ Job {args.job_id} terminated")
            else:
                print(f"❌ Failed to terminate {args.job_id}")


if __name__ == "__main__":
    main()
```

## Cost Comparison Table

| GPU Type  | Clore.ai (Spot) | AWS (On-Demand) | GCP (On-Demand) | Lambda Labs |
| --------- | --------------- | --------------- | --------------- | ----------- |
| RTX 4090  | **$0.25-0.50**  | N/A             | N/A             | $0.50       |
| RTX 3090  | **$0.20-0.35**  | N/A             | N/A             | $0.45       |
| A100 40GB | **$1.00-1.50**  | $32.77          | $3.67           | $1.10       |
| A100 80GB | **$1.50-2.00**  | $40.00          | $4.00           | $1.29       |
| V100      | $0.80           | $3.06           | $2.48           | $0.80       |
| H100      | **$2.00-3.00**  | N/A             | $6.98           | $2.49       |

**Clore.ai consistently offers the best prices for consumer GPUs (RTX series)!**

## Next Steps

* [Auto-Scaling Workers](/inference-and-deployment/auto-scaling-workers.md)
* [Prometheus Monitoring](/devops-and-automation/prometheus-monitoring.md)
* [Cost Optimization](/devops-and-automation/cost-optimization.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dev.clore.ai/advanced-use-cases/multi-cloud.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
