Run

Run represents the most basic and frequent type of Spacelift workload, the only other one being the task. The ultimate purpose of the run is to execute the right Terraform commands in the right environment, with virtually everything else serving to help this process along.

Where do runs come from?

Though runs can be triggered manually using the Trigger action button, most runs are actually triggered by GitHub push webhooks. Every push to a stack's repository will trigger a run, though depending on the target branch of the push the run will be treated (and thus executed) differently.

The magic trigger button

Manually triggering a run is still useful in a few types of scenarios like intermittent API failures, race conditions or changes to the stack environment.

If the target branch of the push matches the tracked branch of the stack, the run will be able to make actual changes to managed resources - in Terraform parlance it can be applied. In Spacelift, we call this type of runs tracked. On the other hand, if the target branch of the push is different, Spacelift will still generate the plan but merely report its results to GitHub. We call these runs proposed.

You can think of the two types of runs as the two elements of the CI/CD dyad. Proposed runs are the CI bit - test runs that allow the user to determine what how the proposed changes would affect the stack, ensuring that the code is correct and does what you expect it to do. Tracked runs on the other hand are the CD bit - based on your decision to merge the changes to the tracked branch, Spacelift safely implements those changes, while giving you one last chance to change your mind.

You can read a bit more about the suggested workflow in the dedicated section of GitHub integration walkthrough.

Run state machine

Given the importance of runs in Spacelift, it's useful to understand the various states of a run, and the entire state machine. Below you will also find the detailed explanation of each state, along with the operations (if any) that take place in each state.

The exact state machine looks like this:

Queued

Queued means that no worker has picked up the run yet. This may be for two reasons - first, no workers are available. We're running a highly dynamic, self-scaling system so this shouldn't frequently be an issue, but it's still possible.

The second - more important - reason, is that the run may be blocked by something else (a tracked run or a task) holding an exclusive lock on the stack. All potential changes to the Terraform state are strongly serialized by Spacelift, which means that tracked runs and tasks (which can run arbitrary commands, including those changing the state) are never allowed to run simultaneously. They'll be executed one after another, in the same order as they were submitted.

If you run or task is currently blocked by something else holding the lock on the stack, you'll see the link to the blocker in the header:

Queued is a passive state meaning no operations are performed while a run is in this state. When a worker picks up the run, its state will automatically transition to Initializing. The user can cancel the run while it still hasn't been picked up for execution, transitioning it to the terminal Canceled state.

Preparing

Though the preparing state appears twice in the state machine, these are actually two separate states (preparing and preparing apply). These are intermediate steps that prepare the job package for the remote worker to consume. Unlike Initializing and the following steps that occur on the worker, this one is performed by Spacelift and won't fail unless there's a serious misconfiguration or a failing policy check.

Canceled

Canceled state means that the user has manually stopped a Queued run or task even before it had the chance to be picked up by the worker. Cancellation is not possible for runs that are already processed, though some of these may be Stopped.

Canceled is a passive state meaning no operations are performed while a run is in this state. It's also a terminal state meaning that no further state can supersede it.

Initializing

From Spacelift's point of view, Initializing is the most busy and most interesting of all states. Sure, Planning and Applying may take more time on average, but Initializing is all about preparing the workspace for Terraform and this is ultimately the main purpose of Spacelift. Just look at the logs to appreciate how much is going on here:

...and that's just the beginning

Here's a step-by-step breakdown of what happens during this stage :

  1. Temporary credentials are generated for integrations, for example AWS (see above for an example);

  2. Runtime configuration is pulled from GitHub and parsed;

  3. Unless it's cached locally, the right version of Terraform is downloaded and verified;

  4. Secrets are decrypted and the environment gets calculated;

  5. Runner Docker image is pulled if necessary;

  6. Files are mounted, if any;

  7. Source code is pulled from GitHub and put in /spacelift/project/source;

  8. If Spacelift manages the state, temporary state credentials are injected into the project root directory;

  9. If defined, before_init actions get executed;

  10. terraform init command gets executed;

  11. Current state size is calculated in order to later determine the delta - this does not apply to tasks;

If the initialization phase succeeds, the run transitions to the Planning state. If it fails - to the Failed state. The run can also be manually stopped during this phase, in which case it transitions to the Stopped state.

Planning

Once the workspace is prepared by the Initializing phase, Planning simply runs terraform plan. For the tracked branch, the output of the plan is captured to a file that can later be used by the Applying phase to make sure that what's being changed is what has been reviewed by the user.

The Planning phase can safely be Stopped by the user. In fact, proposed changes will automatically get stopped when a newer version of the code is pushed to their branch. This is mainly designed to limit the number of unnecessary API calls to your resource providers, though it saves us a few bucks on EC2, too.

Delta

If the planning phase is successful, Spacelift analyses the output of the Terraform plan and counts the resources that would be added, changed and deleted if the plan were to be applied. Here's one example of such delta being reported:

This is also summarised as a colorful strip below every run on the runs page, which visualizes the state change, also in relation to its previous size calculated during the Initializing phase:

The delta is not a cosmetic thing either. While a proposed will always finish after the Planning phase, the next step for a tracked run depends on whether the plan contains any changes. If it doesn't, the run will finish here, too. If it does, Spacelift uploads the entire workspace to S3 and transitions the run to Unconfirmed state for user review before the changes are applied.

Failed

Failed state means that something went wrong. While technically it's possible to transition from every state to Failed due to an application error on Spacelift end, in practice most failures are caused by issues with the code or configuration in one of the active states: Initializing, Planning and Applying.

If the failure occurs during the Applying phase, the associated GitHub deployment is marked as failure, too:

Failed is a passive state meaning no operations are performed while the run is in this state. It's also a terminal state meaning that no further state can supersede it.

Finished

Finished state means that the run was successful. Depending on the run type it may follow different states. For proposed runs, Finished state will occur after Planning, once the outcome and delta are reported to GitHub. Same thing happens for runs on a tracked branch if no changes are detected.

For tracked runs with changes, Finished state will follow a successful Applying phase. In those cases the associated GitHub deployment will also be marked as Active:

If this all sounds somewhat confusing, feel free to refer to the state machine diagram to understand the exact flow.

Finished is a passive state meaning no operations are performed while a run is in this state. It's also a terminal state meaning that no further state can supersede it.

Unconfirmed

Unconfirmed state means that the Planning phase has finished successfully but changes have been detected. The resulting plan is thus shown to the user for a final approval:

If the user approves the plan, the run transitions to the temporary Confirmed state and waits for a worker to pick it up. If the user doesn't like the plan and discards it, the run transitions to the terminal Discarded state.

Note that the transition to Unconfirmed state also creates a GitHub deployment and sets it to the initial Pending state:

As a safety precaution, the uploaded workspace tarball has a 7 day expiration policy set in S3. An attempt to confirm and apply the run after 7 days will immediately fail.

Unconfirmed is a passive state meaning no operations are performed while a run is in this state.

Discarded

Discarded state follows Unconfirmed and indicates that the user did not like the changes detected by the Planning phase. The transition to Discarded state will immediately fail the associated GitHub deployment:

Discarded is a passive state meaning no operations are performed while a run is in this state. It's also a terminal state meaning that no further state can supersede it.

Confirmed

Confirmed state follows Unconfirmed indicates that a user has accepted the plan generated in the Planning phase and wants to apply it but no worker has picked up the job yet. This state is similar to Queued in a sense that shows only temporarily until one of the workers picks up the associated job and changes the state to Applying. On the other hand, there is no way to stop a run once it's confirmed.

Confirmed is a passive state meaning no operations are performed while a run is in this state.

Applying

Applying state can only happen after the run is Confirmed. This phase involves the following steps:

  1. Run environment is built, including regenerating any temporary tokens (eg. AWS credentials);

  2. Workspace temporarily saved to S3 during Planning phase is now downloaded and immediately deleted;

  3. terraform apply is executed with the plan file as input;

The Applying phase will hopefully succeed, in which the run transitions to the Finished state. On the other hand, if anything goes wrong, the run is marked as Failed.

Stopped

Stopped state indicates that a run has been stopped while Initializing or Planning, either manually by the user or - for proposed changes - also by Spacelift. Proposed changes will automatically get stopped when a newer version of the code is pushed to their branch. This is mainly designed to limit the number of unnecessary API calls to your resource providers, though it saves us a few bucks on EC2, too.

Here's an example of a run manually stopped while Initializing:

Stopped is a passive state meaning no operations are performed while a run is in this state. It's also a terminal state meaning that no further state can supersede it.