Skip to content
Snippets Groups Projects
Commit 1f5d531d authored by Sybren A. Stüvel's avatar Sybren A. Stüvel
Browse files

Updated README files; referred to manager/worker README in top-level README

Also merged the two Manager READMEs into one.

NOTE: the current top-level README.md is still heavily outdated and
describes the pre-pillar incarnation of Flamenco.
parent c4a7e959
No related branches found
No related tags found
No related merge requests found
......@@ -3,7 +3,7 @@
Development repo for Flamenco 2.0 (originally known as brender). Flamenco is a
Free and Open Source Job distribution system for render farms.
Warning: currently Flamenco is in beta stage, testing welcome!
Warning: currently Flamenco is in beta stage, and can still change in major ways.
## Quick install with Docker
......@@ -45,78 +45,15 @@ armadillica/flamenco_server_dev
```
### Manager
Setting up the manager is a very similar process, but needs some more interaction. Before we start let's make sure we know the path of the shared Blender binary.
```
$ docker run -ti -p 7777:7777 --name flamenco_manager --link flamenco_server:flamenco_server --link mysql:mysql \
-v /media/data/flamenco/flamenco/manager:/data/git/manager \
-v /media/data/flamenco_data/storage/shared:/data/storage/shared \
-v /media/data/flamenco_data/storage/manager:/data/storage/manager \
armadillica/flamenco_manager_dev
```
As soon as the container is up and running we will be prompted to provide the Blender path for Linux, OSX and Windows. Currently the worker is implemented assuming that all workers connecting to it have access to a shared location where the binary for each OS is located.
### Dashboard
The final component we will install is the dashboard, which allows us to track the progress and manage the various jobs.
```
$ docker run -ti -p 8888:8888 --name flamenco_dashboard --link flamenco_server:flamenco_server \
-v /media/data/flamenco/flamenco/dashboard:/data/git/dashboard \
-v /media/data/flamenco_data/storage/dashboard:/data/storage/dashboard \
armadillica/flamenco_dashboard_dev
```
Now you can access Dashboard using the URL: http://127.0.0.1:8888, in order to see the humbnails corectly you will need to add the following line to your `/etc/hosts` file:
```
xxx.xxx.xxx.xxx flamenco_server
```
Replacing xxx.xxx.xxx.xxx by the flamenco_server docker IP, you can find it running:
```
$ sudo docker inspect flamenco_server
```
When running the dashboard for the first time, we build the html components using gulp. Future updates can only be done by hand on the host OS at the moment.
### Using docker-compose
Once all the containers have been set up, they can be managed using the `docker-compose` tool. Check out the `docker-compose-example.yml` as a base.
The Manager is written in [Go](https://golang.org/). The Manager is documented
in its own [README](./packages/flamenco-manager-go/README.md).
### Worker
The Flamenco worker is a very simple standalone component. The only requirements needed for it are:
* Python 2.7
* the requests library
* the Pillow library
Notice that `Pillow` requires the folowing packages:
```
$ sudo apt-get install libjpeg-dev zlib1g-dev python-dev
```
Is recommended to create and activate a virtual environment:
```
$ cd /media/data
$ virtualenv venv
$ . venv/bin/activate
```
We can make sure that we have the requests library installed on the virtual environment with:
```
(venv)$ pip install requests Pillow
```
After that, we can run the worker with:
```
(venv)$ python /media/data/flamenco/flamenco/worker/run.py --manager 127.0.0.1:7777
```
The Flamenco worker is a very simple standalone component implemented in
[Python](https://www.python.org/). The Worker is documented
in its own [README](./packages/flamenco-worker-python/README.md).
## Developer installation
......
......@@ -16,6 +16,56 @@ absolute path of this `flamenco-manager-go` directory.
6. Build your first Flamenco Manager with `go build`; this will create an executable
`flamenco-manager` in `$FM/src/flamenco-manager`
### Testing
To run all unit tests, run `go test ./flamenco -v`. To run a specific GoCheck test, run
`go test ./flamenco -v --run TestWithGocheck -check.f SchedulerTestSuite.TestVariableReplacement`
where the argument to `--run` determines which suite to run, and `-check.f` determines the
exact test function of that suite. Once all tests have been moved over to use GoCheck, the
`--run` parameter will probably not be needed any more.
## Communication between Server and Manager
Flamenco Manager is responsible for initiating all communication between Server and Manager,
since Manager should be able to run behind some firewall/router, without being reachable by Server.
In the text below, `some_fields` refer to configuration file settings.
### Fetching tasks
1. When a Worker ask for a task, it is served a task in state `queued` or `claimed-by-manager` in
the local task queue (MongoDB collection "flamenco_tasks"). In this case, Manager performs a
conditional GET (based on etag) to Server at /api/flamenco/tasks/{task-id} to see if the task
has been updated since queued. If this is so, the task is updated in the queue and the queue
is re-examined.
2. When the queue is empty, the manager fetches N new tasks from the Server, where N is the number
of registered workers.
### Task updates and canceling running tasks
0. Pushes happen as POST to "/api/flamenco/managers/{manager-id}/task-update-batch"
1. Task updates queued by workers are pushed every `task_update_push_max_interval_seconds`, or
when `task_update_push_max_count` updates are queued, whichever happens sooner.
2. An empty list of task updates is pushed every `cancel_task_fetch_max_interval_seconds`, unless an
actual push (as described above) already happened within that time.
3. The response to a push contains the database IDs of the accepted task updates, as well as
a list of task database IDs of tasks that should be canceled. If this list is non-empty, the
tasks' statuses are updated accordingly.
## Timeouts of active tasks
When a worker starts working on a task, that task moves to status "active". The worker then
regularly calls `/may-i-run/{task-id}` to verify that it is still allowed to run that task. If this
end-point is not called within `active_task_timeout_interval_seconds` seconds, it will go to status
"failed". The default for this setting is 60 seconds, which is likely to be too short, so please
configure it for your environment.
This timeout check will start running 5 minutes after the Manager has started up. This allows
workers to let it know they are still alive, in case the manager was unreachable for longer than
the timeout period. For now this startup delay is hard-coded.
## Known issues & limitations
......@@ -23,11 +73,14 @@ absolute path of this `flamenco-manager-go` directory.
waiting for tasks, when there are 1000nds of tasks and workers of type X and only a relatively
low number of workers and tasks of type Y.
## TO DO
## MISSING FEATURES / TO DO
In no particular order:
- Way for Flamenco Server to get an overview of Workers, and set their status.
- Update worker address upon communication (currently only stored when registering)
- the Task struct in documents.go should be synced with the Eve schema.
- Task queue cleanup. At the moment tasks are stored in the queue forever, since that makes
it possible to notice a task was canceled while a worker was running it. Eventually such
tasks should be cleaned up, though.
- GZip compression on the pushes to Server. This is especially important for task updates, since
they contain potentially very large log entries.
- A way for Flamenco Server to get an overview of Workers, and set their status.
- the Task struct in `documents.go` should be synced with the Eve schema.
## Testing
To run all unit tests, run `go test ./flamenco -v`. To run a specific GoCheck test, run
`go test ./flamenco -v --run TestWithGocheck -check.f SchedulerTestSuite.TestVariableReplacement`
where the argument to `--run` determines which suite to run, and `-check.f` determines the
exact test function of that suite. Once all tests have been moved over to use GoCheck, the
`--run` parameter will probably not be needed any more.
## MISSING FEATURES
- Task queue cleanup. At the moment tasks are stored in the queue forever, since that makes
it possible to notice a task was canceled while a worker was running it. Eventually such
tasks should be cleaned up, though.
- GZip compression on the pushes to Server. This is especially important for task updates, since
they contain potentially very large log entries.
## Communication between Server and Manager
Flamenco Manager is responsible for initiating all communication between Server and Manager,
since Manager should be able to run behind some firewall/router, without being reachable by Server.
In the text below, `some_fields` refer to configuration file settings.
### Fetching tasks
1. When a Worker ask for a task, it is served a task in state `queued` or `claimed-by-manager` in
the local task queue (MongoDB collection "flamenco_tasks"). In this case, Manager performs a
conditional GET (based on etag) to Server at /api/flamenco/tasks/{task-id} to see if the task
has been updated since queued. If this is so, the task is updated in the queue and the queue
is re-examined.
2. When the queue is empty, the manager fetches N new tasks from the Server, where N is the number
of registered workers.
### Task updates and canceling running tasks
0. Pushes happen as POST to "/api/flamenco/managers/{manager-id}/task-update-batch"
1. Task updates queued by workers are pushed every `task_update_push_max_interval_seconds`, or
when `task_update_push_max_count` updates are queued, whichever happens sooner.
2. An empty list of task updates is pushed every `cancel_task_fetch_max_interval_seconds`, unless an
actual push (as described above) already happened within that time.
3. The response to a push contains the database IDs of the accepted task updates, as well as
a list of task database IDs of tasks that should be canceled. If this list is non-empty, the
tasks' statuses are updated accordingly.
## Timeouts of active tasks
When a worker starts working on a task, that task moves to status "active". The worker then
regularly calls `/may-i-run/{task-id}` to verify that it is still allowed to run that task. If this
end-point is not called within `active_task_timeout_interval_seconds` seconds, it will go to status
"failed". The default for this setting is 60 seconds, which is likely to be too short, so please
configure it for your environment.
This timeout check will start running 5 minutes after the Manager has started up. This allows
workers to let it know they are still alive, in case the manager was unreachable for longer than
the timeout period. For now this startup delay is hard-coded.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment