diff --git a/README.md b/README.md index 3d7d5d869a33630d92742b915bf79ae79971f58f..5ec33a18419469f3045eea7158e216f4efd91abd 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ Development repo for Flamenco 2.0 (originally known as brender). Flamenco is a Free and Open Source Job distribution system for render farms. -Warning: currently Flamenco is in beta stage, testing welcome! +Warning: currently Flamenco is in beta stage, and can still change in major ways. ## Quick install with Docker @@ -45,78 +45,15 @@ armadillica/flamenco_server_dev ``` ### Manager -Setting up the manager is a very similar process, but needs some more interaction. Before we start let's make sure we know the path of the shared Blender binary. -``` -$ docker run -ti -p 7777:7777 --name flamenco_manager --link flamenco_server:flamenco_server --link mysql:mysql \ --v /media/data/flamenco/flamenco/manager:/data/git/manager \ --v /media/data/flamenco_data/storage/shared:/data/storage/shared \ --v /media/data/flamenco_data/storage/manager:/data/storage/manager \ -armadillica/flamenco_manager_dev -``` - -As soon as the container is up and running we will be prompted to provide the Blender path for Linux, OSX and Windows. Currently the worker is implemented assuming that all workers connecting to it have access to a shared location where the binary for each OS is located. - -### Dashboard -The final component we will install is the dashboard, which allows us to track the progress and manage the various jobs. - -``` -$ docker run -ti -p 8888:8888 --name flamenco_dashboard --link flamenco_server:flamenco_server \ --v /media/data/flamenco/flamenco/dashboard:/data/git/dashboard \ --v /media/data/flamenco_data/storage/dashboard:/data/storage/dashboard \ -armadillica/flamenco_dashboard_dev -``` - -Now you can access Dashboard using the URL: http://127.0.0.1:8888, in order to see the humbnails corectly you will need to add the following line to your `/etc/hosts` file: - -``` -xxx.xxx.xxx.xxx flamenco_server -``` - -Replacing xxx.xxx.xxx.xxx by the flamenco_server docker IP, you can find it running: - -``` -$ sudo docker inspect flamenco_server -``` - -When running the dashboard for the first time, we build the html components using gulp. Future updates can only be done by hand on the host OS at the moment. - -### Using docker-compose -Once all the containers have been set up, they can be managed using the `docker-compose` tool. Check out the `docker-compose-example.yml` as a base. +The Manager is written in [Go](https://golang.org/). The Manager is documented +in its own [README](./packages/flamenco-manager-go/README.md). ### Worker -The Flamenco worker is a very simple standalone component. The only requirements needed for it are: -* Python 2.7 -* the requests library -* the Pillow library - -Notice that `Pillow` requires the folowing packages: - -``` -$ sudo apt-get install libjpeg-dev zlib1g-dev python-dev -``` - -Is recommended to create and activate a virtual environment: - -``` -$ cd /media/data -$ virtualenv venv -$ . venv/bin/activate -``` - -We can make sure that we have the requests library installed on the virtual environment with: - -``` -(venv)$ pip install requests Pillow -``` - - -After that, we can run the worker with: - -``` -(venv)$ python /media/data/flamenco/flamenco/worker/run.py --manager 127.0.0.1:7777 -``` +The Flamenco worker is a very simple standalone component implemented in +[Python](https://www.python.org/). The Worker is documented +in its own [README](./packages/flamenco-worker-python/README.md). ## Developer installation diff --git a/packages/flamenco-manager-go/README.md b/packages/flamenco-manager-go/README.md index e3bb27e7c074f7cc9ff42e3ddafc35bffa9ca9fb..9d7b7f38366783d7c8fb54f7d0c96b37cc723c71 100644 --- a/packages/flamenco-manager-go/README.md +++ b/packages/flamenco-manager-go/README.md @@ -16,6 +16,56 @@ absolute path of this `flamenco-manager-go` directory. 6. Build your first Flamenco Manager with `go build`; this will create an executable `flamenco-manager` in `$FM/src/flamenco-manager` +### Testing + +To run all unit tests, run `go test ./flamenco -v`. To run a specific GoCheck test, run +`go test ./flamenco -v --run TestWithGocheck -check.f SchedulerTestSuite.TestVariableReplacement` +where the argument to `--run` determines which suite to run, and `-check.f` determines the +exact test function of that suite. Once all tests have been moved over to use GoCheck, the +`--run` parameter will probably not be needed any more. + + +## Communication between Server and Manager + +Flamenco Manager is responsible for initiating all communication between Server and Manager, +since Manager should be able to run behind some firewall/router, without being reachable by Server. + +In the text below, `some_fields` refer to configuration file settings. + +### Fetching tasks + +1. When a Worker ask for a task, it is served a task in state `queued` or `claimed-by-manager` in + the local task queue (MongoDB collection "flamenco_tasks"). In this case, Manager performs a + conditional GET (based on etag) to Server at /api/flamenco/tasks/{task-id} to see if the task + has been updated since queued. If this is so, the task is updated in the queue and the queue + is re-examined. +2. When the queue is empty, the manager fetches N new tasks from the Server, where N is the number + of registered workers. + +### Task updates and canceling running tasks + +0. Pushes happen as POST to "/api/flamenco/managers/{manager-id}/task-update-batch" +1. Task updates queued by workers are pushed every `task_update_push_max_interval_seconds`, or + when `task_update_push_max_count` updates are queued, whichever happens sooner. +2. An empty list of task updates is pushed every `cancel_task_fetch_max_interval_seconds`, unless an + actual push (as described above) already happened within that time. +3. The response to a push contains the database IDs of the accepted task updates, as well as + a list of task database IDs of tasks that should be canceled. If this list is non-empty, the + tasks' statuses are updated accordingly. + + +## Timeouts of active tasks + +When a worker starts working on a task, that task moves to status "active". The worker then +regularly calls `/may-i-run/{task-id}` to verify that it is still allowed to run that task. If this +end-point is not called within `active_task_timeout_interval_seconds` seconds, it will go to status +"failed". The default for this setting is 60 seconds, which is likely to be too short, so please +configure it for your environment. + +This timeout check will start running 5 minutes after the Manager has started up. This allows +workers to let it know they are still alive, in case the manager was unreachable for longer than +the timeout period. For now this startup delay is hard-coded. + ## Known issues & limitations @@ -23,11 +73,14 @@ absolute path of this `flamenco-manager-go` directory. waiting for tasks, when there are 1000nds of tasks and workers of type X and only a relatively low number of workers and tasks of type Y. - -## TO DO +## MISSING FEATURES / TO DO In no particular order: -- Way for Flamenco Server to get an overview of Workers, and set their status. -- Update worker address upon communication (currently only stored when registering) -- the Task struct in documents.go should be synced with the Eve schema. +- Task queue cleanup. At the moment tasks are stored in the queue forever, since that makes + it possible to notice a task was canceled while a worker was running it. Eventually such + tasks should be cleaned up, though. +- GZip compression on the pushes to Server. This is especially important for task updates, since + they contain potentially very large log entries. +- A way for Flamenco Server to get an overview of Workers, and set their status. +- the Task struct in `documents.go` should be synced with the Eve schema. diff --git a/packages/flamenco-manager-go/src/flamenco-manager/README.md b/packages/flamenco-manager-go/src/flamenco-manager/README.md deleted file mode 100644 index ff3ca5f245ceb2d4c3171053cbf1b36942ab0b6e..0000000000000000000000000000000000000000 --- a/packages/flamenco-manager-go/src/flamenco-manager/README.md +++ /dev/null @@ -1,57 +0,0 @@ - - -## Testing - -To run all unit tests, run `go test ./flamenco -v`. To run a specific GoCheck test, run -`go test ./flamenco -v --run TestWithGocheck -check.f SchedulerTestSuite.TestVariableReplacement` -where the argument to `--run` determines which suite to run, and `-check.f` determines the -exact test function of that suite. Once all tests have been moved over to use GoCheck, the -`--run` parameter will probably not be needed any more. - -## MISSING FEATURES - -- Task queue cleanup. At the moment tasks are stored in the queue forever, since that makes - it possible to notice a task was canceled while a worker was running it. Eventually such - tasks should be cleaned up, though. -- GZip compression on the pushes to Server. This is especially important for task updates, since - they contain potentially very large log entries. - -## Communication between Server and Manager - -Flamenco Manager is responsible for initiating all communication between Server and Manager, -since Manager should be able to run behind some firewall/router, without being reachable by Server. - -In the text below, `some_fields` refer to configuration file settings. - -### Fetching tasks - -1. When a Worker ask for a task, it is served a task in state `queued` or `claimed-by-manager` in - the local task queue (MongoDB collection "flamenco_tasks"). In this case, Manager performs a - conditional GET (based on etag) to Server at /api/flamenco/tasks/{task-id} to see if the task - has been updated since queued. If this is so, the task is updated in the queue and the queue - is re-examined. -2. When the queue is empty, the manager fetches N new tasks from the Server, where N is the number - of registered workers. - -### Task updates and canceling running tasks - -0. Pushes happen as POST to "/api/flamenco/managers/{manager-id}/task-update-batch" -1. Task updates queued by workers are pushed every `task_update_push_max_interval_seconds`, or - when `task_update_push_max_count` updates are queued, whichever happens sooner. -2. An empty list of task updates is pushed every `cancel_task_fetch_max_interval_seconds`, unless an - actual push (as described above) already happened within that time. -3. The response to a push contains the database IDs of the accepted task updates, as well as - a list of task database IDs of tasks that should be canceled. If this list is non-empty, the - tasks' statuses are updated accordingly. - -## Timeouts of active tasks - -When a worker starts working on a task, that task moves to status "active". The worker then -regularly calls `/may-i-run/{task-id}` to verify that it is still allowed to run that task. If this -end-point is not called within `active_task_timeout_interval_seconds` seconds, it will go to status -"failed". The default for this setting is 60 seconds, which is likely to be too short, so please -configure it for your environment. - -This timeout check will start running 5 minutes after the Manager has started up. This allows -workers to let it know they are still alive, in case the manager was unreachable for longer than -the timeout period. For now this startup delay is hard-coded.