Skip to content
Snippets Groups Projects

Exercise 1

The goal of this exercise is to perform a simple computation over many input files.

Your task is to go through all files located in the files directory, and count how many lines that contain a given needle (which will be provided as an input) are in each file.

The inputs.txt file contains an "index" of all input files in the files directory (one relative path per line).

Take a look at the compute.sh script, and implement the missing command for counting the number of lines matching the provided needle.

Then start HQ (server + worker) and submit a single job that will execute the compute.sh script on all files from the files directory. Make sure to pass some needle value to the compute.sh script when submitting the job!

Hint: Take a look at task arrays to find out how to create a job with multiple tasks. Which way of creating a task array will be most useful to you for this task?

Checking the results

After the job is completed, a job-%d directory should be created on the disk. The directory should contain two files (stdout and stderr) for each input file from the files directory. You can use the following command to check the MD5 checksum of the directory, to make sure that your result corresponds to the expected outcome:

$ find <job-output-dir> -type f -exec md5sum {} \; | sort -k 2 | md5sum

Here are a few reference results:

  • For needle ab, the MD5 checksum should be a1b7f784de574cf58248d7179a1d418d.
  • For needle a, the MD5 checksum should be 05906bf0dc3ef89ae769395d26ef5391.
  • For needle th, the MD5 checksum should be 7a16d51a9aa96240ccb6f782a6f1cd9a.