SCS
docs.it4i.cz

Repository



Exercise 1
The goal of this exercise is to perform a simple computation over many input files.
Your task is to go through all files located in the files directory, and count how many lines
that contain a given needle (which will be provided as an input) are in each file.
The inputs.txt file contains an "index" of all input files in the files directory (one relative
path per line).
Take a look at the compute.sh script, and implement the missing command for counting the number
of lines matching the provided needle.
Then start HQ (server + worker) and submit a single job that will execute the compute.sh script
on all files from the files directory. Make sure to pass some needle value to the compute.sh
script when submitting the job!

Hint: Take a look at task arrays
to find out how to create a job with multiple tasks. Which way of creating a task array will be most
useful to you for this task?


Checking the results
After the job is completed, a job-%d directory should be created on the disk. The directory should
contain two files (stdout and stderr) for each input file from the files directory. You can
use the following command to check the MD5 checksum of the directory, to make sure that your result
corresponds to the expected outcome:

$ find <job-output-dir> -type f -exec md5sum {} \; | sort -k 2 | md5sum


Here are a few reference results:

For needle ab, the MD5 checksum should be a1b7f784de574cf58248d7179a1d418d.
For needle a, the MD5 checksum should be 05906bf0dc3ef89ae769395d26ef5391.
For needle th, the MD5 checksum should be 7a16d51a9aa96240ccb6f782a6f1cd9a.