> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vast.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Serverless Parameters

> Learn about the parameters that can be configured for Vast.ai Serverless endpoints and worker groups.

<script
  type="application/ld+json"
  dangerouslySetInnerHTML={{
__html: JSON.stringify({
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Vast.ai Serverless Parameters Reference",
  "description": "Complete reference for Vast.ai Serverless parameters including endpoint parameters (cold_mult, cold_workers, max_workers, min_load, target_util) and workergroup parameters (gpu_ram, launch_args, search_params, template_hash, template_id, test_workers).",
  "author": {
    "@type": "Organization",
    "name": "Vast.ai"
  },
  "articleSection": "Serverless Documentation",
  "keywords": ["parameters", "configuration", "endpoints", "workergroups", "scaling", "serverless", "vast.ai"]
})
}}
/>

The Vast.ai Serverless engine has parameters that allow control over the scaling behavior.&#x20;

# Endpoint Parameters

## Cold Multiplier

A multiplier applied to your target capacity for longer-term planning (1+ hours). This parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs. For example, if your current target capacity is 100 tokens/sec and cold\_mult is 2.0, the engine will plan to have capacity for 200 tokens/sec for longer-term scenarios.

This helps ensure your endpoint has sufficient "cold" (stopped but ready) workers available to handle future load spikes without delay. A higher value means more aggressive capacity planning and better preparedness for sudden traffic increases, while a lower value reduces costs from maintaining stopped instances.

If not specified during endpoint creation, the default value is 3.

## Minimum Workers

The minimum number of workers that must be kept in the endpoint at all times.

If not specified during endpoint creation, the default value is 5.

## Max Workers

A hard upper limit on the total number of workers that the endpoint can have at any given time.

If not specified during endpoint creation, the default value is 16.

## Minimum Load

A minimum baseline load (measured in perf units / second) that the serverless engine will be able to handle, regardless of actual measured traffic. This acts as a "floor" for load predictions across all time horizons (1 second to 24+ hours).

For example, if your Minimum Load is set to 100 tokens/second, but your endpoint currently has zero traffic, the serverless engine will still plan capacity as if you need to handle at least 100 tokens/second. This prevents the endpoint from scaling down to zero capacity and ensures you're always ready for incoming requests.

If not specified during endpoint creation, the default value is 1.

## Minimum Cold Load

The minimum baseline load (measured in perf units/second) that the serverless engine will maintain with loaded workers. While Minimum Load ensures a capacity of
"Ready" workers, Minimum Cold Load requires a capacity of workers that have fully loaded the model.

Workers that count toward this minimum are:

* Actively serving requests (status = "Ready")
* Stopped but ready to serve (status = "Inactive" with model loaded)

These workers can start serving requests within seconds because they don't need to download the model or benchmark GPU performance. This parameter
is particularly useful for maintaining low-latency response times during traffic spikes or after periods of low activity.

If not specified during endpoint creation, the default value is 0.

## Target Utilization

The Target Utilization ratio determines how much spare capacity (headroom) the serverless engine maintains. For example, if your predicted load
is 900 tokens/second and target\_util is 0.9, the serverless engine will plan for 1000 tokens/second of capacity (900 ÷ 0.9 = 1000), leaving 100
tokens/second (11%) as buffer for traffic spikes.

A lower target\_util means more headroom:

* target\_util = 0.9 → 11.1% spare capacity relative to load
* target\_util = 0.8 → 25% spare capacity relative to load
* target\_util = 0.5 → 100% spare capacity relative to load
* target\_util = 0.4 → 150% spare capacity relative to load

If not specified during endpoint creation, the default value is 0.9.

# Workergroup Parameters

The parameters below are specific to only Workergroups, not Endpoints. Pre-configured serverless templates from Vast will have these values already set.

## gpu\_ram

The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs.

If not specified during workergroup creation, the default value is 24.

## launch\_args

A command-line style string containing additional parameters for instance creation that will be parsed and applied when the serverless engine creates new workers. This allows you to customize instance configuration beyond what's specified in templates.

There is no default value for launch\_args.

## search\_params

A query string, list, or dictionary that specifies the hardware and performance criteria for filtering GPU offers in the vast.ai marketplace. It uses a simple query syntax to define requirements for the machines that your Workergroup will consider when searching for workers to create.

Example:

```python icon="python" Python theme={null}
{"verified": {"eq": true}, "rentable": {"eq": true}, "rented": {"eq": false}}
```

There is no default value for search\_params. To see all available search filters, see the CLI docs [here](https://docs.vast.ai/cli/commands).

## template\_hash

A unique hexadecimal identifier that references a pre-configured template containing all the configuration needed to create instances. Templates are comprehensive specifications that include the Docker image, environment variables, onstart scripts, resource requirements, and other deployment settings.

There is no default value for template\_hash.

## template\_id

A numeric (integer) identifier that uniquely references a template in the Vast.ai database. This is an alternative way to reference the same template that `template_hash` points to, but using the template's database primary key instead of its hash string.

There is no default value for template\_id.
