Newer
Older
# Exercise 1
The goal of this exercise is to perform a simple computation over many input files.
Your task is to go through all files located in the `files` directory, and count how many lines
that contain a given `needle` (which will be provided as an input) are in each file.
The `inputs.txt` file contains an "index" of all input files in the `files` directory (one relative
path per line).
Take a look at the `compute.sh` script, and implement the missing command for counting the number
of lines matching the provided `needle`.
Then start HQ (server + worker) and submit a single job that will execute the `compute.sh` script
on all files from the `files` directory. Make sure to pass some `needle` value to the `compute.sh`
script when submitting the job!
> Hint: Take a look at [task arrays](https://it4innovations.github.io/hyperqueue/stable/jobs/arrays/#lines-of-a-file)
> to find out how to create a job with multiple tasks. Which way of creating a task array will be most
> useful to you for this task?
## Checking the results
After the job is completed, a `job-%d` directory should be created on the disk. The directory should
contain two files (`stdout` and `stderr`) for each input file from the `files` directory. You can
use the following command to check the `MD5` checksum of the directory, to make sure that your result
corresponds to the expected outcome:
```bash
$ find <job-output-dir> -type f -exec md5sum {} \; | sort -k 2 | md5sum
```
Here are a few reference results:
- For needle `ab`, the MD5 checksum should be `a1b7f784de574cf58248d7179a1d418d`.
- For needle `a`, the MD5 checksum should be `05906bf0dc3ef89ae769395d26ef5391`.
- For needle `th`, the MD5 checksum should be `7a16d51a9aa96240ccb6f782a6f1cd9a`.