Exercise 1
The goal of this exercise is to perform a simple computation over many input files.
Your task is to go through all files located in the files
directory, and count how many lines
that contain a given needle
(which will be provided as an input) are in each file.
The inputs.txt
file contains an "index" of all input files in the files
directory (one relative
path per line).
Take a look at the compute.sh
script, and implement the missing command for counting the number
of lines matching the provided needle
.
Then start HQ (server + worker) and submit a single job that will execute the compute.sh
script
on all files from the files
directory. Make sure to pass some needle
value to the compute.sh
script when submitting the job!
Hint: Take a look at task arrays to find out how to create a job with multiple tasks. Which way of creating a task array will be most useful to you for this task?
Checking the results
After the job is completed, a job-%d
directory should be created on the disk. The directory should
contain two files (stdout
and stderr
) for each input file from the files
directory. You can
use the following command to check the MD5
checksum of the directory, to make sure that your result
corresponds to the expected outcome:
$ find <job-output-dir> -type f -exec md5sum {} \; | sort -k 2 | md5sum
Here are a few reference results:
- For needle
ab
, the MD5 checksum should bea1b7f784de574cf58248d7179a1d418d
. - For needle
a
, the MD5 checksum should be05906bf0dc3ef89ae769395d26ef5391
. - For needle
th
, the MD5 checksum should be7a16d51a9aa96240ccb6f782a6f1cd9a
.