I can use the following bash script to assign a variable ID from each line (first 1000 lines in this case) of the ids file and run it as argument to pythonscript.py.
#!/bin/bash
#$ -t 1:1000
#$ -N thename
#$ -j y
ids=NOBACK/ids.txt
echo "Starting on : $(date)"
echo "Running on node : $(hostname)"
echo "Current directory : $(pwd)"
echo "Current job ID : $JOB_ID"
echo "Current job name : $JOB_NAME"
echo "Task index number : $SGE_TASK_ID"
ID=`awk "NR==$SGE_TASK_ID" {IDS}`
echo "id is: $ID"
python pythonscript.py --idarg ID
echo "Finished on : $(date)"
But if the file is a csv file and I need to assign multiple variables, how it could be done?
Best How To :
Imagine you have the following csv file (named super.csv):
name,postcode,dob
alan,XXXAAA,11/11/55
bruji,AAAXXX,20/10/88
...
zorri,AXAXAX,01/01/01
and you want to use the first and third fields as arguments in your sungrid engine array job. The following will extract the fields from the lane equals to the $SGE_TASK_ID:
NAME=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $1}' super.csv)
DOB=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $3}' super.csv)
I was messing with double and single quotes. The bash will ignore $ var within single quotes and extend variables within double quotes. Here -v is injecting the $SGE_TASK_ID variable to awk scope (within single quotes).