> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vast.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Pricing

> Learn how Vast.ai Serverless pricing works - GPU recruitment, endpoint suspension, and stopping.

<script
  type="application/ld+json"
  dangerouslySetInnerHTML={{
__html: JSON.stringify({
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Vast.ai Serverless Pricing",
  "description": "Understanding Vast.ai Serverless pay-as-you-go pricing including GPU recruitment, endpoint suspension and stopping, and billing by instance state (Ready, Loading, Creating, Inactive).",
  "author": {
    "@type": "Organization",
    "name": "Vast.ai"
  },
  "articleSection": "Serverless Documentation",
  "keywords": ["pricing", "billing", "pay-as-you-go", "GPU costs", "serverless", "vast.ai", "endpoints"]
})
}}
/>

Vast.ai Serverless offers pay-as-you-go pricing for all workloads at the same rates as Vast.ai's non-Serverless GPU instances. Each instance accrues cost on a per second basis.
This guide explains how pricing works.

## GPU Recruitment

As the Serverless engine takes requests, it will automatically scale its number of workers up or down depending on the incoming and forecasted demand. When scaling up,
the engine searches over the Vast.ai marketplace for GPU instances that offer the best performance / price ratio. Once determined, the GPU instance(s) is recruited into
the Serverless engine, and its cost (\$/hr) is added to the running sum of all GPU instances running on your Serverless engine.&#x20;

As the request demand falls off, the engine will remove GPU instance(s) and your credit account immediatley stops being charged for those corresponding instance(s).

Visit the [Billing Help](/documentation/reference/billing#ugwiY) page to see details on GPU instance costs.

## Suspending an Endpoint

When an Endpoint is **suspended**:

* The Serverless Engine will no longer manage the GPU instances contained within the Endpoint.
* GPU instances in this Endpoint will still be able to receive requests.&#x20;

## Stopping an Endpoint

**Stopping** an Endpoint will:

* Cause the Serverless Engine to no longer manage the GPU instances contained within the Endpoint.
* Put all existing GPU instances into the Inactive state.

An **Inactive** GPU instance will:

* Not receive any work.
* Not charge GPU compute costs.
* Charge the user's account for **storage** and **bandwidth**.&#x20;

## Billing by Instance State

The specific charges depend on the instance's state:

| State    | GPU compute | Storage | Bandwidth in | Bandwidth out |
| -------- | ----------- | ------- | ------------ | ------------- |
| Ready    | Billed      | Billed  | Billed       | Billed        |
| Loading  | Billed      | Billed  | Billed       | Billed        |
| Creating | Not billed  | Billed  | Billed       | Billed        |
| Inactive | Not billed  | Billed  | Billed       | Billed        |

GPU compute refers to the per-second GPU rental charges. See the [Billing Help](/documentation/reference/billing#ugwiY) page for rate details.
