GitLab CI¶

Before we start diving into writing CI configuration files we're going to cover what a CI configuration looks like and go over some basics.

We'll cover concepts like:

Pipelines versus Stages versus Jobs
Pipeline configuration
artefacts: and cache: blocks
rules: and script: blocks
The dependencies: block

And we'll use the Terraform .gitlab-ci.yml file as the example, along with some other examples and some visuals to help us along the way.

Let's get started.

Pipelines & Stages & Jobs¶

In GitLab CI we have three things we need to be aware of: pipelines, stages and jobs.

A pipeline contains stages. Stages contain jobs. Jobs contains configuration that tells GitLab CI what it is you want to do.

We define a single pipeline by providing a .gitlab-ci.yml file. The file itself represents the pipeline. We then define stages in this pipeline (by adding them to the file) and each stage in turn has a single job added to it. Each job defines the stage it belongs to, the rules that decide if the job should execute, and the script that is executed.

It's important to understand the difference between these elements, so let's visualise this with a simple example:

graph LR
    a1 --> discord1
    a2 --> b1
    b1 --> c1
    c1 --> d1

    subgraph Stage A
        a1[Discord Notification]
        a2[Test Code]
    end

    subgraph Stage B
        b1[Compile Code]
    end

    subgraph Stage C
        c1[Package Code]
    end

    subgraph Stage D
        d1[Deploy Code]
    end

    subgraph Discord API
        discord1[API]
    end

Here we have four stages:

Stage A
Stage B
Stage C
Stage D

In stage A we have two jobs: Discord Notification and Test Code. These jobs run in parallel under certain conditions, but we're not going to cover that at this point in time. In a future update to the book we will cover parallel execution. For now let's keep things simple.

Now we have stages B, C, and D. These are going to run in that order, precisely, and each execute a single job. Each stage depends on the previous to do some work or produce some artefact that we need in the next stage(s).

So once the Test Code job in Stage A has completed it called the next job (Compile Code) in the next stage (Stage B). This repeats: B -> C and finally C -> D, until the whole pipeline is completed.

The whole diagram represents a complete pipeline.

So remember: a pipeline contains stages, stages contain jobs and jobs contain configuration instructing GitLab CI to execute stuff for us.

As YAML¶

To put this into an example more closely aligned with reality, let's write out the above as actual YAML configuration:

stages:
    - stage_a
    - stage_b
    - stage_c
    - stage_d

Discord Notification:
    stage: stage_a
    script:
        - discord_notification.sh

Test Code:
    stage: stage_a
    script:
        - test_code.sh

Compile Code:
    stage: stage_b
    artefacts: # cache something to be downloaded by a future job
        name: binary
        paths:
            - ./my_binary
    script: compile_code.sh

Package Code:
    stage: stage_c
    dependencies:
        - Compile Code # download the artefact we created earlier
    artefacts: # cache our own artefact
        name: package
        paths:
            - ./my_package.zip
    script: package_code.sh

Deploy Code:
    stage: stage_d
    dependencies:
        - Package Code # download the package artefact
    script: deploy_code.sh

This is valid YAML and a valid pipeline configuration. It contains the stages we mentioned above and their associated jobs.

Because each job is in its own stage the whole thing will run in a linear manner (minus Stage A, that has two jobs that will attempt to run in parallel). We also further ensure a linear progression where it matters by using the artefacts: and dependencies: keywords, which create an explicit dependency between the jobs, thus forcing a linear execution.

Let's now review the contents of a real CI configuration file. It's the file we'll be writing to configure our Terraform pipeline.

Pipeline Configuration¶

We've covered the differences between a pipeline, a stage and a job. Let's now start looking at the Terraform .gitlab-ci.yml file and begin to understand the keywords used to construct the whole pipeline.

Here are the very first few lines of our Terraform's .gitlab-ci.yml file, which constitute the pipeline's global configuration as well as some default values:

image: registry.gitlab.com/gitlab-org/terraform-images/stable:latest
variables:
  TF_ROOT: ${CI_PROJECT_DIR}
  TF_ADDRESS: ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${CI_PROJECT_NAME}

cache:
  key: httpcats-beta
  paths:
    - ${TF_ROOT}/.terraform

before_script:
  - cd ${TF_ROOT}

All of this is configuring the pipeline to behave in a particular way and do some tasks for use ahead of each stage. Let's review each of the configuration options above.

`image:`¶

This configures the entire pipeline to run all script: configurations (explained below) in a Docker container using a specific image: registry.gitlab.com/gitlab-org/terraform-images/stable:latest.

This particular image is perfect for our needs not just because it provides Terraform but because it's suitable for us inside of GitLab CI pipelines due to some bootstrapping thats being done around Terraform. This will become more clear later on.

`variables:`¶

This configuration keyword allows us to define variables that are available for use across the entire pipeline, in all stages, and can be used for all kinds of things.

`cache:`¶

Using the cache: keyword we can have the pipeline cache certain files and or directories between stages/jobs, and even across pipelines themselves. For us this is important because after we call terraform init we need to copy the .terraform/ to the other stages in the pipeline. If we didn't we would have to call terraform init for every job.

`before_script:`¶

In our stages we use the script: keyword to define the functionality of each stage and actually get our work done. The before_script: configuration is used to have a script execute before the script inside of each of our script: blocks. We're using the GitLab CI provided Terraform Docker image, so we need to use this feature to move into the TF_ROOT location.

So as an example if we had the following before_script::

- mkdir -p my_directory/hello/world

We're defining a script that makes sure a directory we need in each stage always exists. If we then used the following script: in a stage inside of our pipeline:

- echo 'Hello, world' > my_directory/hello/world/message.txt

Before the stage's script executed our before_script: would run, which means we'd effectively be getting (as I'm sure you've guessed):

- mkdir -p my_directory/hello/world
- echo 'Hello, world' > my_directory/hello/world/message.txt

If you had two or more stages that needed this directory, then of course it would get repetitive having to provide the same command every time. Plus if the name of the directory changed you could use a variable and also change the mkdir call in a single place.

Stages¶

Our Terraform pipeline has the following stages:

stages:
  - validate
  - plan
  - apply
  - destroy

These stages are stepped through, one by one, in the order shown. We have four stages:

validate
plan
apply
destroy

I believe if we explain the jobs behind the validate, plan and apply stages, and their respective job configurations, then we'll have enough information to successfully write the actual files themselves. The destroy stage will be understandable after you've studied the others.

Note

The GitLab CI documentation covers stages in more detail.

Validate¶

This is the configuration of a single job inside the validate stage:

validate:
  stage: validate
  rules:
    - if: $RUN_ANYWAY == "YES"
    - exists:
        - .destroy
      when: never
    - changes:
        - "*.tf"
  script:
    - gitlab-terraform init
    - gitlab-terraform validate

Let's break this down into its core components.

Rules¶

When we use the rules: keyword we're telling GitLab CI that our job (not the pipeline or even the stage as a whole) has a list of rules from which one must equal "true" (with a short-circuit effect in place) before this job will be included in its particular stage.

If none of the rules evaluate to "true", then this job does not execute, but the rest of the stage may very well if another job inside of said stage does evaluate to "true".

Let's look at this visually with a simple, contrived example (and assume all jobs are in a single stage):

graph LR
    a[Job 1] --> b
    b[Job 2] --> c
    c[Job 3]

If we have the following rules for each job, we can make adjustments to them to alter the above flow...

a => rules: A_VAR==1
b => rules: B_VAR==2
c => rules: C_VAR==3

If we execute the pipeline and we set A_VAR=1 and B_VAR=2, but we set C_VAR=99, then the pipeline will look like this:

graph LR
    a[Job 1] --> b
    b[Job 2]

If we flip that logic on its head entirely, setting A_VAR and B_VAR to 99, and C_VAR=3, then the pipeline will look like this:

graph LR
    a[Job 3]

Put another way: if a job's rules exclude it from the stage, then GitLab CI moves on to the next job looking for one that evaluates to "true" which is then included in the stage (which means the stage is included in the pipeline).

All of our stages only have a single job defined in them.

So what rules do we have in our validate job?

- exists:
    - .destroy
    when: never

We're using an exists: keyword to determine if a file (.destroy) exists or not. If it does then the when: keyword determines what should happen, and in this case never means this stage should never be included in pipeline.

- changes:
    - "*.tf"

Finally we're asking GitLab CI to check for changes to a list of pattern matches. In our case we're looking for changes to any files that match *.tf, or Terraform configuration files. In the event such changes do exist then this rule evaluates to true and the stage is included in the pipeline.

The final rule explains why I've opted to include the first rule, the if: keyword: what if there are no changes to the Terraform files? How do I run the pipeline? By including this if: check means I can "override" the other rules and have the stage included in all cases.

This raises another important point about rules: in GitLab CI: the first rule in the list to evaluate to true determines if the stage is included in the pipeline or not. No other rules are evaluated after this point. That's why the if: clause is included first - it means all other rules are ignored if RUN_ANYWAY = YES.

Script¶

The script: keyword is basically the backbone of most CI configurations. It's how we define the actual functionality of the job within the stage. There are other things we can do with a job, like triggering other remote pipelines, but what you'll see the most is a script: keyword being used to execute some shell code.

In our script we init the Terraform installation. Then we validate that the syntax of the code is valid. If not then the stage will fail and the pipeline will come to a halt.

In the above script we're using the gitlab-terraform

Plan¶

Now let's review the same thing for the plan stage - it's job configuration:

    - gitlab-terraform validate

plan:
  stage: plan
  artifacts:
    name: plan
    paths:
      - ${TF_ROOT}/plan.cache

    # This is a piece of magic that pushes the plan into the Terraform
    # backend of GitLab (CI)
    reports:
      terraform: ${TF_ROOT}/plan.json
  rules:
    - if: $RUN_ANYWAY == "YES"
    - exists:
        - .destroy
      when: never
    - changes:
        - "*.tf"
  script:
    - gitlab-terraform plan

We have a keyword here - artefacts: - that we haven't seen before. Let's go over what it does.

artefacts¶

With the artefacts: keyword we're telling GitLab CI to create two artefacts: the plan file for Terraform to use at later stages, and the JSON version that gets pushed into the back end of the GitLab CI Terraform solution.

Note

We'll ignore the magic behind the latter part of this stage and instead just focus on the first part.

The artefacts: is a bit like the cache: keyword we saw earlier - it stores items of interest for us. However with artefacts we decide what stages get the artefacts themselves, where as using cache: means whatever is cached is included in all stages. Not every stage needs whatever we store as artefacts, which is why we use them.

As we need to generate a Terraform plan so that our apply can do its job, we use the artefacts: keyword to store it for later recovery.

Apply¶

And finally we'll review the one job we've configured for the apply stage:

    - gitlab-terraform plan-json

apply:
  stage: apply
  environment:
    name: production
  dependencies:
    - plan
  rules:
    - if: $RUN_ANYWAY == "YES"
    - exists:
        - .destroy
      when: never

    # This is a nested condition:
    # - Include the stage if the commit branch is the project's default branch (such as master)
    # - Include the stage if there are changes to .tf files in the commit
    # - (attribute) Include the stage but make sure it's a manual run
    # - (attribute) Allow the stage to fail
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      changes:

With another unique keyword we've not seen yet: dependencies:.

Dependencies¶

Here we encounter another new keyword: dependencies:.

In the plan stage we used the artefacts: keyword to create a downloadable artefact from our Terraform plan file (plan.cache). Now we're using the dependencies: keyword to tell the job what artefacts to download from what job. In this case it's the plan job, as defined in the code above.

This is how we move objects between jobs, stages and even pipelines: we use artefacts: and dependencies: (among other features available to use too.)

Conclusion¶

We've gone over the basics of a simple GitLab CI configuration file. We've now got a feel for the formatting and some of the basic keywords being used. This is enough to work with for the time being, but if you want to know more or just simply explore what's available (tinkering is a good idea!) then checkout the GitLab CI configuration file reference.

In the next section we're going to discuss the Terraform pipeline configuration and then begin to actually start writing out files.