Commit 9e44952e authored by Vojtech Cima's avatar Vojtech Cima
Browse files

DOC: fixed terminology

parent 7a9bf30e
# Loom
# HyperLoom
Loom is a framework for distributed computation, mainly focused on scientific
pipelines.
HyperLoom is a platform for defining and executing workflow pipelines in a distributed environment. HyperLoom aims to be a highly scalable framework that is able to efficiently execute millions of interconnected tasks on hundreds of computational nodes.
* BSD license
* High-level Python interface, back-end written in C++
* Peer-to-peer data sharing between workers
* Low latency & low task overhead to process hundred thousands of tasks
HyperLoom features:
* Optimized dynamic scheduling with low overhead.
* In-memory data storage with a direct access over the network with a low I/O footprint.
* Direct worker-to-worker data transfer for low server overhead.
* Third party application support.
* Data-location aware scheduling reducing inter-node network traffic.
* C++ core with a Python client enabling high performance available through a simple API.
* High scalability and native HPC support.
* BSD license.
## Quickstart
Execute a Loom pipeline in 4 easy steps:
Execute your first HyperLoom pipeline in 4 easy steps using [Docker](https://docs.docker.com/):
### 1. Deploy virtualized Loom infrastructure
### 1. Deploy virtualized HyperLoom infrastructure
```
docker-compose up
......@@ -20,7 +25,7 @@ docker-compose up
Note that before re-running `docker-compose up` you need to run `docker-compose down` to delete containers state.
### 2. Install Loom client (virtualenv)
### 2. Install HyperLoom client (virtualenv)
```
virtualenv -p python3 loom_client_env
......@@ -55,3 +60,15 @@ print(result) # Prints b"Hello world!"
```
python3 pipeline.py
```
## Documentation
You can build the full documentation from the [doc](./doc) subdirectory by running `make html`.
## Acknowledgements
This project has received funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 671555. This work was also supported by The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project „IT4Innovations excellence in science - LQ1602“ and by the IT4Innovations infrastructure which is supported from the Large Infrastructures for Research, Experimental Development and Innovations project „IT4Innovations National Supercomputing Center – LM2015070“.
## License
See the [LICENSE](./LICENSE) file.
......@@ -5,7 +5,7 @@ Python client
Basic usage
-----------
The following code contains a simple example of Loom usage. It creates two
The following code contains a simple example of HyperLoom usage. It creates two
constants and a task that merge them. Next, it creates a client and connect to
the server and submits the plan and waits for the results. It assumes that the
server is running at address *localhost* on TCP port 9010.
......@@ -24,7 +24,7 @@ server is running at address *localhost* on TCP port 9010.
The full list of build-in tasks can be found in :ref:`PyClient_API_Tasks`.
Method ``submit_one`` is non-blocking and returns instance of
``loom.client.Future`` that represents a remote computation in Loom
``loom.client.Future`` that represents a remote computation in HyperLoom
infrastructure. There are basic four operations that is provided by
``loom.client.Future``:
......@@ -52,13 +52,13 @@ more tasks/futures at once:
from loom.client import Client, tasks
task1 = tasks.const("Hello ") # Create a plain object
task2 = tasks.const(" ") # Create a plain object
task3 = tasks.const("world!") # Merge two data objects together
task1 = tasks.const("Hello ") # Create a plain object
task2 = tasks.const(" ") # Create a plain object
task3 = tasks.const("world!") # Merge two data objects together
client = Client("localhost", 9010) # Create a client
results = client.submit(task3) # Submit tasks; returns list of futures
print(client.gather(results)) # prints [b"Hello world!", b" ", b"world!"]
client = Client("localhost", 9010) # Create a client
results = client.submit((task1, task2, task3)) # Submit tasks; returns list of futures
print(client.gather(results)) # prints [b"Hello world!", b" ", b"world!"]
In this case, we have replaced ``submit_one`` by method ``submit`` that takes a
collection of tasks and we have called the method ``gather`` not on the future
......@@ -74,7 +74,7 @@ Reusing futures as tasks inputs
+++++++++++++++++++++++++++++++
Futures can be also used as input for tasks. This allows to use a gradual submitting,
i.e. loom may already computes some part of the computation while the remaining plan
i.e. HyperLoom may already computes some part of the computation while the remaining plan
is still composed.
.. code-block:: python
......@@ -228,7 +228,7 @@ In previous examples, we have always used a constant arguments for programs;
however, programs arguments can be also parametrized by data objects. When an
input data object is mapped to a file name that starts with character `$` then
no file is mapped, but the variable with the same name can be used in
arguments. Loom expands the variable before the execution of the task.
arguments. HyperLoom expands the variable before the execution of the task.
The following example executes program `ls` where the first argument is
obtained from data object.
......@@ -273,7 +273,7 @@ This program prints the following:
Python functions in plans
-------------------------
Loom allows to execute directly python functions as tasks. The easiest way is to
HyperLoom allows to execute directly python functions as tasks. The easiest way is to
use decorator ``py_task()``. This is demonstrated by the following code: ::
from loom.client import tasks
......@@ -350,7 +350,7 @@ Task context
------------
Python task can configured to obtain a ``Context`` object as the first argument.
It provides interface for interacting with the Loom worker.
It provides interface for interacting with the HyperLoom worker.
The following example demonstrates logging through context object::
from loom.client import tasks
......@@ -373,13 +373,13 @@ Direct arguments
----------------
Direct arguments serve for the Python task configuration without necessity to
create loom tasks. From the user perspective it works in a similar way as
create HyperLoom tasks. From the user perspective it works in a similar way as
context -- they introduces extra parameters. The values for parameters are set
when the task is called. They can be arbitrary serializable objects and they are
passed to the function when the py_task is called. Direct arguments are always
passed as the first n arguments of the function. They are specified only by a
number, i.e. how many first n arguments are direct (the rest arguments are
considered normal loom tasks).
considered normal HyperLoom tasks).
Let us consider the following example::
......@@ -418,7 +418,7 @@ arguments via ``py_call``::
Python objects
--------------
Data objects in loom can be directly a Python objects. A constant value can be created
Data objects in HyperLoom can be directly a Python objects. A constant value can be created
by ``tasks.py_value``::
from loom.client import tasks
......@@ -453,9 +453,9 @@ Data objects::
return [ctx.wrap({"A", (1,2,3)}), "Hello"]
The first example returns a plain object. The second example returns PyObj. The third one returns
Loom array with PyObj and plain object.
HyperLoom array with PyObj and plain object.
.. Important:: Loom always assumes that all data objects are immutable.
.. Important:: HyperLoom always assumes that all data objects are immutable.
Therefore, modyfing unwrapped objects from PyObj leads to highly
undefined behavior. It is recommended to store only immutable
objects (strings, tuples, frozensets, ...) in PyObj to prevent
......@@ -478,7 +478,7 @@ Loom array with PyObj and plain object.
Reports
-------
Reporting system serves for debugging and profiling the Loom programs.
Reporting system serves for debugging and profiling the HyperLoom programs.
Reports can be enabled by ``set_trace`` method as follows::
task = ...
......@@ -560,7 +560,7 @@ simultenously more light weight tasks than cores available for the worker.
Dynamic slice & get
-------------------
Loom scheduler recognizes two special tasks that dynamically modify the plan --
HyperLoom scheduler recognizes two special tasks that dynamically modify the plan --
**dynamic slice** and **dynamic get**. They dynamically create new tasks
according the length of a data object and the current number of workers and
their resources. The goal is to obtain an optimal number of tasks to utilize the
......@@ -608,7 +608,7 @@ the data object produced by ``x``::
Own tasks
---------
Module ``tasks`` contains tasks provided by the worker distributed with Loom. If
Module ``tasks`` contains tasks provided by the worker distributed with HyperLoom. If
we extend a worker by our own special tasks, we also need a way how to call them
from the client.
......
# -*- coding: utf-8 -*-
#
# Loom documentation build configuration file, created by
# HyperLoom documentation build configuration file, created by
# sphinx-quickstart on Sun Nov 13 23:32:34 2016.
#
# This file is execfile()d with the current directory set to its
......@@ -55,9 +55,9 @@ master_doc = 'index'
numfig = True
# General information about the project.
project = u'Loom'
copyright = u'2016, Loom Team'
author = u'Loom Team'
project = u'HyperLoom'
copyright = u'2016, HyperLoom Team'
author = u'HyperLoom Team'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
......@@ -141,7 +141,7 @@ html_theme = 'sphinx_rtd_theme'
# The name for this set of Sphinx documents.
# "<project> v<release> documentation" by default.
#
# html_title = u'Loom v0.2'
# html_title = u'HyperLoom v0.6'
# A shorter title for the navigation bar. Default is the same as html_title.
#
......
......@@ -6,7 +6,7 @@ Extending worker
The API in the following section is not yet fully stable.
It may be changed in the near future.
Loom infrastructure offers by default a set of operations for basic manipulation
HyperLoom infrastructure offers by default a set of operations for basic manipulation
with data objects and running and external programs. One of this task is also
task `loom/py_call` (it can be used via ``tasks.py_call`` or ``tasks.py_task``
in Python client). This task allows to executed arbitrary Python codes and the
......@@ -17,7 +17,7 @@ efficiency, since worker extensions can be written in C++. Moreover, this
approach is more powerfull than py_call, since not only tasks but also new data
objects may be introduced.
On the implementation level, Loom contains a C++ library **libloom** that
On the implementation level, HyperLoom contains a C++ library **libloom** that
implements the worker in an extensible way.
.. _Extending_new_tasks:
......@@ -74,7 +74,7 @@ thread. The subclass has to implement ``run()`` method that is executed when the
task is fired. It should return data object or ``nullptr`` when an error occurs.
The following code defines ``main`` function for the modified worker. It is
actually the same code as for the worker distributed with Loom except the
actually the same code as for the worker distributed with HyperLoom except the
registartion of our new task. Each task has to be registered under a symbol.
Symbols for buildin tasks, data objects and resource requests starts with prefix
`loom/`. To avoid name clashes, it is good practice to introduce new prefix, in
......
.. Loom documentation master file, created by
.. HyperLoom documentation master file, created by
sphinx-quickstart on Sun Nov 13 23:32:34 2016.
Loom
HyperLoom
====
User guide
......
......@@ -2,29 +2,29 @@
Installation
============
Loom has two components from the installation perspective:
HyperLoom has two components from the installation perspective:
* Runtime (Server and Worker)
* Runtime - the HyperLoom infrastructure (Server and Worker)
* Python client
Both components resides in the same Git repository,
but their installations are independent.
The main repository is: https://code.it4i.cz/boh126/loom
The main repository is: https://code.it4i.cz/ADAS/loom
Runtime
-------
Runtime depends on the following libraries that are not included into the Loom
source codes:
The HyperLoom infrastructural components depend on the following libraries that are not included in the HyperLoom
source code:
* **libuv** -- Asychronous event notification
* **Protocol buffers** -- Serialization library
* **Python >=3.4** (optional)
* **Clouldpickle** (optional)
(Loom also depends on **spdlog** and **Catch** that are distributed together
with Loom)
(HyperLoom also depends on **spdlog** and **Catch** that are distributed together
with HyperLoom)
In **Debian** based distributions, dependencies can be installed by the
following commands: ::
......@@ -38,7 +38,7 @@ following commands: ::
feature.
When dependencies are installed, Loom itself can be installed by the following
When dependencies are installed, HyperLoom itself can be installed by the following
commands: ::
cd loom
......
......@@ -2,42 +2,32 @@
Introduction
============
**Loom** is a platform for defining and executing workflow pipelines in a
distributed environment. **Loom** is designed to provide a scalable framework
that is able to efficiently execute many small interconnected tasks on HPC (High
Performance Computing) infrastructure.
A user provides a description of the computation in a form of DAG (Directed
Acyclic Graph) that captures dependencies between tasks. The infrastructure
automatically schedules tasks on available nodes while managing all necessary
data transfers.
HyperLoom is a platform for defining and executing workflow pipelines in a
distributed environment. HyperLoom aims to be a highly scalable framework
that is able to efficiently execute millions of interconnected tasks on hundreds of computational nodes.
User defines and submits a plan - a computational graph (Directed Acyclic Graph) that captures dependencies between computational tasks. The HyperLoom infrastructure then automatically schedules the tasks on available nodes while managing all necessary data transfers.
Architecture
------------
*Loom* architecture is depicted in :numref:`architecture`.
HyperLoom architecture is depicted in :numref:`architecture`. HyperLoom consist of a server process that manages worker processes running on computational nodes and a client component that provides an user interface to HyperLoom.
The main components are:
* **client** -- A gateway for user -- it serves for submitting a plan (in form
of a DAG) to the server and waits for the results of the computation. Loom is distributed with
Python client.
* **client** -- The Python gateway to HyperLoom -- it allows users to programmatically chain computational tasks into a plan and submit the plan to the server. It also provides a functionality to gather results of the submitted tasks after the computation finishes.
* **server** -- The main process that orchestrates the execution of computation.
It processes a plan received from the client and instructs the worker
processes.
* **server** -- receives and decomposes a HyperLoom plan and reactively schedules tasks to run on available computational resources provided by workers.
* **worker** -- A process that is controlled by the server and performs the
actual computation. Loom provides ways to extend worker by user codes to
provide new kind of tasks or data types. (Server and worker are written in
C++)
* **worker** -- executes and runs tasks as scheduled by the server and inform the server about the task states. HyperLoom provides options to extend worker functionality by defining custom task or data types. (Server and worker are written in C++.)
.. figure:: arch.png
:width: 400
:alt: Architecture scheme
:name: architecture
:align: center
Architecture of Loom
Architecture of HyperLoom
Basic terms
......@@ -45,47 +35,8 @@ Basic terms
The basic elements of Loom's programming model are: **data object**, **task**,
and **plan**. A **data object** is an arbitrary data structure that can be
serialized/deserialized. A **task** represents a computations that produces data
objects and **plan** is a set of computations.
Data objects
++++++++++++
Data objects are fundamental entities in Loom. They represent values that serves
as arguments and results of tasks. There are the following build-in basic types
of data objects:
* **Plain object** -- An anonymous sequence of bytes without any additional
interpretation by Loom.
* **File** -- A handler to an external file on shared file system. From the
user's perspective, it behaves like a plain object; except when a data
transfer between nodes occurs, only a path to the file is transferred.
* **Array** -- A sequence of arbitrary data objects
* **Index** -- A logical view over a D-Object data object with a list of positions.
It is used to slice data according some positions (e.g. positions of the
new-line character to extract lines). It behaves like an array without
explicit storing of each entry.
* **PyObj** -- Contains an arbitrary Python object
We call objects that are able to provide a content as continous
chunk of memory as **D-Objects**. Plain object and File object are D-Objects;
Array, Index, and PyObj are *not* D-Objects.
Each data object
* **size** -- the number of bytes needed to store the object
* **length** -- the number of 'inner pieces'. Length is zero when an object has no
inner structure. Plain objects and files have always zero length; an array has length
equal to number of lements in the array.
.. Note:: **size** is an approximation. For a plain object, it is the length of
data itself without any metada. The size of an array is a sum of sizes
of elements. The size of PyObj is obtained by ``sys.getsizeof``.
serialized/deserialized. A **task** represents a computational unit that produces data
objects. A **plan** is a set of interconnected tasks.
Tasks
+++++
......@@ -128,12 +79,52 @@ defined in the graph.
Symbols
+++++++
Customization and extendability are important concepts of Loom. Loom is designed
Customization and extendability are important concepts of HyperLoom. HyperLoom is designed
to enable creating customized workers that providies new task types, data
objects and resources. Loom uses the concept of name spaces to avoid potential
objects and resources. HyperLoom uses the concept of name spaces to avoid potential
name clashes between different workers. Each type of data object, task type and
resource type is identified by a symbol. Symbols are hierarchically organized
and the slash character `/` is used as the separator of each level (e.g.
`loom/data/const`). All built-in task types, data object types, and resource
types always start with `loom/` prefix. Other objects introduced in a a
specialized worker should introduce its own prefix.
Data objects
++++++++++++
Data objects are fundamental entities in HyperLoom. They represent values that serves
as arguments and results of tasks. There are the following build-in basic types
of data objects:
* **Plain object** -- An anonymous sequence of bytes without any additional
interpretation by HyperLoom.
* **File** -- A handler to an external file on shared file system. From the
user's perspective, it behaves like a plain object; except when a data
transfer between nodes occurs, only a path to the file is transferred.
* **Array** -- A sequence of arbitrary data objects
* **Index** -- A logical view over a D-Object data object with a list of positions.
It is used to slice data according some positions (e.g. positions of the
new-line character to extract lines). It behaves like an array without
explicit storing of each entry.
* **PyObj** -- Contains an arbitrary Python object
We call objects that are able to provide a content as continous
chunk of memory as **D-Objects**. Plain object and File object are D-Objects;
Array, Index, and PyObj are *not* D-Objects.
Each data object
* **size** -- the number of bytes needed to store the object
* **length** -- the number of 'inner pieces'. Length is zero when an object has no
inner structure. Plain objects and files have always zero length; an array has length
equal to number of lements in the array.
.. Note:: **size** is an approximation. For a plain object, it is the length of
data itself without any metada. The size of an array is a sum of sizes
of elements. The size of PyObj is obtained by ``sys.getsizeof``.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment