Skip to content

TensorFlow

TensorFlow is a powerful open-source machine learning platform geared towards neural networks.

Currently, our support for TensorFlow is primarily through the command-line interface, although we note that a custom PythoML workflow could be used to run TensorFlow workflows through our web app (e.g. for hyperparameter tuning).

A note on Python / Tensorflow compatibility: Keep in mind that TensorFlow's development is rapidly evolving, and thus its dependencies can suddenly change in new versions. We recommend checking the official TensorFlow documentation to verify that the version of Python and version of TensorFlow selected are compatible with one-another.

Key concerns to note are that:

Running TensorFlow

Tensorflow can either be run interactively, or by a job submission script.

Job Submission Script

In order to run TensorFlow through a job submission script, first connect to Cluster-001 or Cluster-007, and make a folder for the job.

1
2
3
cd ~/cluster-007
mkdir my_tensorflow_job
cd my_tensorflow_job 

Then, add the python script that we wish to run. For example, the following script can be used to test whether TensorFlow can see the GPU when it runs.

1
2
3
4
5
6
7
# Test whether TensorFlow can be imported
print("Importing TensorFlow", flush=True)
import tensorflow as tf

# Test whether TensorFlow sees the GPU
print("\nlisting physical devices:", flush=True)
print(tf.config.list_physical_devices())

Now that we have a Python script, we can create the job submission script. To access a GPU, we'll use the GOF queue.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash

#PBS -N TensorFlow-Test
#PBS -j oe
#PBS -l nodes=1
#PBS -l ppn=1
#PBS -l walltime=00:00:10:00
#PBS -q GOF

# ===================
# CONFIG FOR THIS JOB
# ===================
# Name of the python script we want to run
pythonScriptFile="my_python_script.py"

# =======
# RUN JOB
# =======
module load cuda/11.5
module load python/3.9.1

# Install Tensorflow
virtualenv .env
source .env/bin/activate
pip3 install tensorflow-gpu

# Run TensorFlow
python3 $pythonScriptFile &> python_log.txt

Finally, the job can be submitted with qsub:

1
qsub my_job_script.sh

Interactive Use

Oftentimes, it is useful to run TensorFlow interactively, for example to debug a script. To run a TensorFlow job interactively, first create a job that will spin up a GPU instance. For example, the following script will create a GPU instance that lasts one hour.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/bin/bash

#PBS -N TensorFlow-Test
#PBS -j oe
#PBS -l nodes=1
#PBS -l ppn=1
#PBS -l walltime=00:01:00:00
#PBS -q GOF

sleep 1h

Then, connect to the cluster that ran the job. In our example, we're running the GPU node on Cluster-007:

1
ssh cluster-007

From there, once the sleeper job has begun running, the node name can be found by running qstat -f. Connect via SSH.

The node's version of CUDA and its GPU drivers can be verified by running nvidia-smi while connected. Note that this command only exists on nodes that contain GPU nodes.

The CUDA module can be loaded, and TensorFlow's GPU version can be installed as follows:

1
2
3
4
5
6
module load cuda/11.5
module load python/3.9.1

virtualenv .env
source .env/bin/activate
pip3 install tensorflow-gpu