This page describes how to automatically setup a cluster of Virtual Machines (VMs) in Google Cloud Platform to run MPI applications in a distributed manner.

The code implementing the different steps described in this page can be downloaded at cloud_mpi.tar.gz

Before starting

As a very first step, we strongly suggest you to go through the FAQ.

To be able to execute the instructions presented in this tutorial, you first need to:

Create an account on GCP and activate your Education credits:
- For this step, follow the dedicated tutorial
Install the gcloud command line tool.
- Instructions to install gcloud on your system are available at https://cloud.google.com/sdk/docs/quickstarts.
Learn how to connect to your VM using SSH:
- Connecting to VM instances using SSH

Reminder: As explained in the tutorials above, don’t forget to delete the resources you created before disconnecting from GCP.

Included in this page

This page documents a fully automatic procedure to create a cluster of VMs to run MPI applications in a distributed manner in the Cloud. It includes:

The automatic allocation of VMs using Terraform
The automatic configuration of the VMs using Ansible
The deployment and execution of MPI applications using scripts
Freeing all resources using Terraform

For a description of how the instantiation of VMs in the Cloud works, we recommend you to have a look to:

Allocation of VMs using Terraform

Installing Terraform

Terraform is a resource provisioning tool based on a declarative approach. It is an Infrastructure-as-Code approach that allows you to describe the resources you would like to allocate and that takes care of the allocation for you.

The first step is to install Terraform on your machine. To install Terraform on Ubuntu 18.04, run the following commands:

wget https://releases.hashicorp.com/terraform/0.12.10/terraform_0.12.10_linux_amd64.zip
unzip terraform_0.12.10_linux_amd64.zip
sudo mv terraform /usr/local/bin/

Information about the installation of Terraform on other systems is available here.

Allocating VMs

The provided archive (cloud_mpi.tar.gz) includes Terraform files for a basic deployment.

Initialization

After extracting the archive, move into the cloud_mpi directory. All variables to configure your deployment are defined in the file setup.sh.

To configure your deployment, you must define in this file:

The variable GCP_userID with your GCP user ID
- your GCP_userID is your google login where . and @ are replaced with _ (for example, thomas_ropars_gmail_com – probably limited to 32 characters)
The variable GCP_privateKeyFile with the path to the private key to use to connect to the VMs
The variable TF_VAR_project with the name of your project
- To obtain the name of your project, run: gcloud config get-value project

If you want to select the region/zone where the VMs are deployed, you can also modify the corresponding variables in setup.sh

To make the definition of the variables accessible to all tools we are going to use afterwards, run in your terminal (to be run again everytime you modify the content of this file):

source ./setup.sh

Configuring Terraform

Before using Terraform to allocate resources inside you project, we need to create a service account to enable Terraform to do so. To create the service account, run the following commands:

gcloud iam service-accounts create [SERVICE_NAME]
gcloud projects add-iam-policy-binding [PROJECT_NAME] --member serviceAccount:[SERVICE_NAME]@[PROJECT_NAME].iam.gserviceaccount.com --role roles/editor
gcloud iam service-accounts keys create ./[SERVICE_NAME].json --iam-account [SERVICE_NAME]@[PROJECT_NAME].iam.gserviceaccount.com

Note that:

You are free to choose the [SERVICE_NAME] you want. By default, we use the name deployment-key. If you decide to use another name, you should modify setup.sh accordingly and source it again.

After this, you need to enable the Compute Engine API on your project to allow controlling the creation of VMs from the command line (if it is not enabled yet):

gcloud services enable compute.googleapis.com

The deployment we want to execute is defined in the file simple_deployment.tf.

The first thing to be done is to initialize Terraform for the target provider. For that, simply run:

terraform init

This step needs to be run only once. In our case Terraform will be initialized for GCP because this is the provider we declared in simple_deployment.tf.

Starting VMs

We are now ready to create our VMs. By default the deployment defined in file simple_deployment.tf will start 2 VMs of type f1-micro. You can change these value by modifying the corresponding variables in the file setup.sh.

Before launching the deployment, we can review the state that is going to be created using:

terraform plan

Finally we can launch the deployment using:

terraform apply

After a few 10s of seconds, the VMs will be created.

Configuration of the VMs using Ansible

Installing Ansible

Ansible is a configuration management tool that also follows the Infrastructure-as-Code approach. We are going to use Ansible to configure the VMs we started with Terraform.

Instructions to install Ansible on your machine can be found at https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html#

!!! Warning !!!: The Ansible playbooks we provide assume that version 2.8 or higher or Ansible is used

Initialization

To be able to use Ansible, we first need to collect in a file named hosts the IP address of all the VMs that have been started using Terraform.

To this end, you are provided with the script parse-tf-state.py. To execute it, simply run:

./parse-tf-state.py

Configuring the VMs

We provide you with 3 Ansible playbooks:

install_mpi.yml that will install Open MPI and all other required packages on the VMs
config_ssh.yml that will modify the SSH configuration of the VMs to allow running MPI applications
nfs.yml that will install a NFS server and configure the VMs so that they share the same home directory through NFS.

To install MPI on all VMs, run:

ansible-playbook -i ./hosts install_mpi.yml

To configure SSH on all the VMs, run:

ansible-playbook -i ./hosts config_ssh.yml

To install and configure NFS on all the VMs, run:

ansible-playbook -i ./hosts nfs.yml

Your VMs are now ready to compile and run MPI applications.

Running MPI applications

Here are the steps to follow to run your MPI application on your virtualized cluster. The provided script exec_mpi_app.py runs all these steps automatically (discussed bellow):

Copy the source code to one VM
Log into the VM and compile the code there
Launch the MPI application
- A file hostfile_mpi, containing the hostname of all VMs, is already created for you and available in the $HOME directory of the VMs. Pass this file as argument to the mpirun command.

The script exec_mpi_app.py runs all these steps automatically for you from your laptop. The arguments it takes as parameters are described below.

To compile and execute a program

In its simplest form, the script takes as argument the path to the directory that includes the source files to compile (the content of this directory will be copied to the cloud), and a command to execute in the target nodes:

./exec_mpi_app.py -d [PATH_TO_SOURCE_DIR] -c "[COMMAND_TO_EXECUTE]"

For instance, assuming that the source code is in directory /home/src and that the program to execute is mpi_ring.run, we can execute it on your cloud VMs by running:

./exec_mpi_app.py -d /home/src -c "mpirun -np 4 --hostfile hostfile_mpi ./mpi_ring.run"

Note: The script assumes that a Makefile is provided in the source code directory to compile the code

Note 2: The script exec_mpi_app.py can take as argument the name of a script to execute (using option -s) after the code has been compiled, as an alternative to specifying directly a command as argument.

Cleaning

Do not forget to delete any resource that you have created before disconnecting. Even after you disconnect, you will continue paying for the resources that are not deleted.

To destroy all the VMs that you have created using Terraform, simply run:

terraform destroy

You can verify that no VM associated with your account are still running using:

gcloud compute instances list