Skip to content
Snippets Groups Projects
README.md 1.71 KiB
Newer Older
  • Learn to ignore specific revisions
  • Jakub Beránek's avatar
    Jakub Beránek committed
    # Exercise 1
    The goal of this exercise is to perform a simple computation over many input files.
    
    Your task is to go through all files located in the `files` directory, and count how many lines
    that contain a given `needle` (which will be provided as an input) are in each file.
    
    The `inputs.txt` file contains an "index" of all input files in the `files` directory (one relative
    path per line).
    
    Take a look at the `compute.sh` script, and implement the missing command for counting the number
    of lines matching the provided `needle`.
    
    Then start HQ (server + worker) and submit a single job that will execute the `compute.sh` script
    on all files from the `files` directory. Make sure to pass some `needle` value to the `compute.sh`
    script when submitting the job!
    
    > Hint: Take a look at [task arrays](https://it4innovations.github.io/hyperqueue/stable/jobs/arrays/#lines-of-a-file)
    > to find out how to create a job with multiple tasks. Which way of creating a task array will be most
    > useful to you for this task?
    
    ## Checking the results
    After the job is completed, a `job-%d` directory should be created on the disk. The directory should
    contain two files (`stdout` and `stderr`) for each input file from the `files` directory. You can
    use the following command to check the `MD5` checksum of the directory, to make sure that your result
    corresponds to the expected outcome:
    
    ```bash
    $ find <job-output-dir> -type f -exec md5sum {} \; | sort -k 2 | md5sum
    ```
    
    Here are a few reference results:
    - For needle `ab`, the MD5 checksum should be `a1b7f784de574cf58248d7179a1d418d`.
    - For needle `a`, the MD5 checksum should be `05906bf0dc3ef89ae769395d26ef5391`.
    - For needle `th`, the MD5 checksum should be `7a16d51a9aa96240ccb6f782a6f1cd9a`.