How to run a job for Tensorflow on the Grid

Summary

Instructions for running TensorFlow jobs on the Grid using GPU nodes. Includes copying tutorial files, editing the job script, submitting with sbatch, and locating output and error files.

Body

Follow these steps to run a Tensorflow job. Note: Make sure you have access to nodes with GPUs.

 Step 1

Log in to the Grid.

Step 2

Copy the required contents using the following command:

cp -R /wsu/el7/scripts/tutorial/addition.py

cp -R /wsu/el7/scripts/tutorial/tensorflow_job

Step 3

The file that has the job script is tensorflow_job. It contains the following script:

#!/bin/bash

# Job name

#SBATCH --job-name Tensorflow

# Submit to the GPU QoS

#SBATCH -q gpu

# Request one node

#SBATCH -N 1

# Total number of cores, in this example it will 1 node with 1 core each.

#SBATCH -n 1

# Request memory

#SBATCH --mem=5G

# Request the GPU type

#SBATCH --constraint="k40"

# Mail when the job begins, ends, fails, requeues

#SBATCH --mail-type=ALL

# Where to send email alerts

#SBATCH --mail-user=xxyyyy@wayne.edu

# Create an output file that will be output_<jobid>.out

#SBATCH -o output_%j.out

# Create an error file that will be error_<jobid>.out

#SBATCH -e errors_%j.err

# Set maximum time limit

#SBATCH -t 1:0:0

ml python/3.7

source /wsu/e17/pre-compiled/python/3.7/etc/profile.d/conda.sh

conda activate tensorflow_env

python addition.py

Note: Make sure that addition.py is in your home directory.

Step 4

To submit the job, type: sbatch tensorflow_job

Once the job is submitted you can check to see job information with the following command: qme

Step 5

You will find the output and error files in your home directory once the job has completed, check the contents of your home directory by typing: ls

 

Details

Details

Article ID: 20208
Created
Tue 7/15/25 3:20 PM
Modified
Thu 12/11/25 11:56 AM