Commit 0d805e4d authored by Tomas Martinovic's avatar Tomas Martinovic

Documentation update and doe fix

parent 40954433
# ALL
## ALL - Agora Learning Library
# ALL - Agora Learning Library
This repository consists of two plugins for the autotuner [mARGOt](https://gitlab.com/margot_project).
These plugins are written in [R]() and use CSV or CASSANDRA DB for data storage.
These plugins are written in [R](https://cran.r-project.org/) and use CSV or [Apache Cassandra](http://cassandra.apache.org/) for data storage.
Plugins are made to work with mARGOt, it is possible to run the R scripts on their own, if the data are prepared according to their documentation.
First plugin [DOE](doe/README.md) is focused on sampling the application design space and returns a set of configurations which should be run by the mARGOt to maximize the information given by exploring these configurations.
Second plugin [HTH](hth/README.md) is a learning plugin which uses the results of the explored configurations and creates a model for the whole design space.
First plugin [DOE](doe/) is focused on sampling the application design space and returns a set of configurations which should be run by the mARGOt to maximize the information given by exploring these configurations.
Second plugin [HTH](hth/) is a learning plugin which uses the results of the explored configurations and creates a model for the whole design space.
For this it uses several models:
- linear
- MARS
......@@ -18,6 +17,6 @@ User specifies minimum R2 and maximum MAE for the models.
If there are models which has acceptable R2 and MAE, the one with the minimum MAE is chosen as the best model.
Finally, the HTH plugin writes predicted values for a whole design space into CSV file or CASSANDRA DB.
## Acknowledgement
# Acknowledgement
This work was supported by the ESF in “Science without borders” project, reg. nr. CZ.02.2.69/0.0/0.0/16_027/0008463 within the Operational Programme Research, Development and Education.
......@@ -149,3 +149,203 @@ x,y,z,counter
1.1,1.8,0.7,1
0,0,0.4,1
1.9,1.3,0,1
0.2,1.1,0.4,1
0.8,0.6,1,1
1.7,1,0.2,1
0.5,0.3,0.8,1
1.8,0.5,1.8,1
1.1,0.3,0.7,1
1.5,0.4,1.3,1
0.5,0.8,0.4,1
0.1,1.7,0.4,1
1.7,1,0.9,1
1.1,1.7,0.5,1
0.7,1.5,1.2,1
0.5,1.2,0.8,1
1.1,1.4,1,1
1.6,1.4,0.6,1
1.8,1.9,0.5,1
0.9,0,0.1,1
1.8,0.1,0.1,1
0.9,0.2,1.3,1
1,1.5,1.4,1
0.1,0.4,1.2,1
1.8,1.1,1.8,1
0.8,1.2,1,1
0.3,0,1.2,1
0,1.2,0.1,1
1.9,1.6,0,1
1.4,0.5,1.8,1
1.5,1.4,1.3,1
1.9,0.1,0.5,1
0.2,0.2,0.6,1
1.6,0.8,1.4,1
0.3,1.8,1,1
0.8,1.6,0.8,1
0.9,1.8,1.6,1
1,0.4,1.9,1
0.3,1.6,1.7,1
0.3,0.4,1.7,1
1.7,0.5,1,1
1,1.1,0.6,1
0.1,0.1,1.8,1
0.9,1.8,0,1
0.5,1.7,0.1,1
0.4,0.9,1.1,1
1.6,1.4,1.6,1
1,1.4,0.3,1
0.9,0.6,1.5,1
0.8,0.9,0,1
0.9,0.6,0.6,1
0.6,1.2,1.6,1
0.3,0.6,0.1,1
1,1.1,1.7,1
0.2,1.6,1.2,1
0.1,0.1,0.5,1
0,0.7,1.3,1
0.5,0.1,0.9,1
1.5,0.4,1.5,1
1.6,0,0.6,1
0.9,1.7,0.9,1
0.5,1.6,1.8,1
0.1,0.3,1.5,1
0.7,0.6,0,1
0.5,0.4,1.4,1
0.2,0.7,0.3,1
1.9,0.9,0.6,1
1.1,1.5,0.4,1
0,0,1.8,1
0.6,0.8,0.5,1
0.4,0.2,0.2,1
0.1,1.6,1.8,1
1.8,1.6,0.4,1
0.9,0.5,1.1,1
0.4,0.3,1.8,1
1.4,1,0.5,1
0.8,0.1,1.7,1
1.2,1.5,1,1
1.6,0.9,1.6,1
1,1,0.4,1
1.3,0.2,1.2,1
0.4,0.9,1.3,1
1.1,0.6,1.5,1
1.6,1.4,0,1
0,0.1,0.9,1
0.3,1.7,0.3,1
0.4,1,1.9,1
1.7,0.5,0.1,1
1.2,0,0.3,1
1.1,1.6,1.4,1
0,1.3,0.5,1
1.3,0.6,0.4,1
0.8,0.1,0,1
0.5,0.2,1.4,1
1,0.5,0.2,1
1.5,1.8,1.2,1
1.2,0.4,0.9,1
0.7,0.5,1.7,1
1.8,1.2,1.1,1
1.9,1.2,1.8,1
0.8,1.9,0.2,1
1.9,1.9,1.4,1
0.8,0.2,0.4,1
0.4,0.1,0.4,1
1.4,1.2,0.3,1
0.1,0.8,1.9,1
1.2,0,1,1
0.7,0.8,0.2,1
0.6,0.3,1.6,1
0.4,0.7,0.6,1
0.1,0.1,1.4,1
1,1.9,1.1,1
0.7,1.8,1.3,1
1.2,0.8,1.3,1
1.8,1.4,0.7,1
0.1,0.7,0,1
0.4,0.7,1.4,1
0.5,0.4,0,1
1.5,1.8,1.4,1
0.9,1.5,0.4,1
1.3,0.6,0,1
0.3,1,0.8,1
0.6,1.7,1.9,1
1.2,0.8,0.6,1
1.6,1.5,1.6,1
0.9,0,1.7,1
1.4,1.6,0,1
1.6,1.8,0.4,1
1.7,0.8,1.7,1
1.2,1.1,1.5,1
0.1,1.4,0.1,1
1.8,0.5,1.7,1
1.9,0.1,1.8,1
0.2,1.6,1.6,1
1.1,0.4,1.4,1
1.2,0.3,0.8,1
0.1,0.1,0.2,1
0.8,0.8,1.2,1
0.2,1.9,1.1,1
0.3,1.4,0.5,1
0.2,1.9,1.8,1
1.5,0.3,1,1
0.8,1.3,1.5,1
0.2,0.4,0.7,1
1.6,0.2,0.3,1
1.3,1,1.8,1
1.2,0.2,0.1,1
0.4,0.3,0.7,1
0.6,1.9,0.1,1
0.2,0.1,1.8,1
0.7,1.3,0.7,1
1.9,0.8,0.9,1
1.4,1.5,1,1
0.6,0.5,1.1,1
1.7,0.5,1.2,1
0.8,0.5,0.1,1
1.2,0.5,0.5,1
0.6,0.5,0.2,1
1.1,0.5,1.2,1
0.6,0.5,1.2,1
0.7,0.4,0.4,1
0.6,0.5,1.6,1
0.6,0.5,0.8,1
1.4,0.5,1.7,1
1.2,0.5,0,1
1,0.2,0.6,1
1.2,0.5,0.7,1
0.7,0.4,0.1,1
1.2,0.4,0.9,1
0.6,0.5,0.4,1
0.6,0.5,1,1
1.4,0.5,1.9,1
0.6,0.5,0.1,1
0.8,0.3,1.2,1
0.9,0.5,1.8,1
1.3,0.4,0.2,1
1.4,0.5,1.4,1
1.7,0.5,0.2,1
0.6,0.5,1.4,1
1.9,0.5,0.7,1
0.6,0.5,0.7,1
1.5,0,0.9,1
0.9,0.5,0.2,1
1.6,0.5,1,1
1.3,0.5,1.7,1
0.6,0.5,1.5,1
0.7,0.5,1.1,1
1,0.1,0.6,1
0.9,0.2,1.3,1
0.9,0.5,1.5,1
1.7,0.5,0.3,1
1.2,0.1,1.6,1
0.7,0.5,1,1
0.9,0.5,0.6,1
1.8,0.5,1.8,1
1.8,0.5,1.7,1
1.3,0.5,0.8,1
0.8,0.5,0.6,1
0.6,0.5,0.3,1
0.6,0.5,0.5,1
0.7,0.5,0.8,1
1.6,0,1.8,1
1.2,0.5,0.1,1
This diff is collapsed.
# Plugin hth
# Plugin DOE (Design of Experiments)
## Description
This plugin reads the description of the application\`s design space.
User can define following parameters
- sampling algorithm
- distance parameter for the sampling algorithms
- number of points to be chosen
- how many times should mARGOt run each configuration
- restrictions on the design space.
At the moment it is possible to chose from the following sampling algorithms provided by the [DiceDesign](https://cran.r-project.org/web/packages/DiceDesign) package.
- full factorial designs
- Strauss Design
- Maximum Entropy Design (Dmax)
- Latin Hypercube Design
- WSP design.
Output of the algorithm is a table with knobs configurations and counter column.
For example if the knobs are named *x,y,z*, the output will be.
```
x,y,z,counter
0,0,0,1
0.5,0,0,1
```
## Requirements
The main script that perform the computation is written in R.
......@@ -22,6 +45,83 @@ install.packages('rlang')
install.packages('RJDBC')
```
## Acknowledgement
## Running the plugin without mARGOt
To run the plugin without the mARGOt you can run the main.R script with one parameter with the path to the plugin folder.
```
main.R "~/git/all/doe"
```
### Config file
In this folder is also the configuration file called `agora_config.env`.
This file should contain following settings
- STORAGE_TYPE, the storage type `CSV` or `CASSANDRA`.
- STORAGE_ADDRESS, relative or absolute path to the folder/keyspace with data
- APPLICATION_NAME, name of the application
- DOE_NAME, name of the sampling algorithm (`dmax`, `lhs`, `strauss`, `wsp`, `full_factorial`)
- NUMBER_CONFIGURATIONS_PER_ITERATION, number of point which should be sampled from the design space
- NUMBER_OBSERVATIONS_PER_CONFIGURATION, number how many run should margot do on each configuration
- MAX_NUMBER_ITERATION, number of iteration in mARGOt \[does not make sense unless You use iterative system on sampling design space and model learning\]
- MINIMUM_DISTANCE, a number between 0 and 1, to determine distance on points sampled in the [0,1] space.
- LIMITS, optional string of restrictions on the knobs, delimited by ";". example `"x + y < 1; x > -2"`
### Input data files
In the data folder given by $STORAGE_ADDRESS should be two CSV files given by $APPLICATION_NAME_knobs.csv and $APPLICATION_NAME_metrics.csv.
If the application name includes "/", these will be changed to "\_" due to the mARGOt configurations.
So if the application name is "kursawe/v1/test", resulting file names will be
- kursawe_v1_test_knobs.csv
- kursawe_v1_test_metrics.csv.
$APPLICATION_NAME_knobs.csv should have two comma separated columns *name,values*, where *values* should contain all the possible values that are feasible for given metric separated by ";".
In the example files, this file also has column *type* which is used by mARGOt.
Example
```
name,type,values
x,double,0;0.1;0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9;1.0;1.1;1.2;1.3;1.4;1.5;1.6;1.7;1.8;1.9
y,double,0;0.1;0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9;1.0;1.1;1.2;1.3;1.4;1.5;1.6;1.7;1.8;1.9
z,double,0;0.1;0.2;0.3;0.4;0.5;0.6;0.7;0.8;0.9;1.0;1.1;1.2;1.3;1.4;1.5;1.6;1.7;1.8;1.9
```
$APPLICATION_NAME_metrics.csv should have one column *name*, where are listed all the possible
metrics.
In practice mARGOt uses this file to determine metrics and the plugin which should be used to learn given metric model.
Therefore, in the example file there are two more columns *type,prediction*.
Example
```
name,type,prediction
m,double,hth
n,double,hth
```
### Output files
The script creates two output files
- $APPLICATION_NAME_doe.csv
- $APPLICATION_NAME_model.csv.
$APPLICATION_NAME_doe.csv contains all the configuration which should be explored by mARGOt and number of runs for each exploration.
Example for knobs named *x,y,z*
```
x,y,z,counter
0.5,1.3,0.8,1
1.3,1,1.3,1
```
$APPLICATION_NAME_model.csv is a file which should be output of the learning module and in this case its output is only to create a dummy file with all the possible configurations and no results.
Example for knobs named *x,y,z* and metric *m* and *n*
```
"x","y","z","m_avg","m_std","n_avg","n_std"
2,0,1,NA,NA,NA,NA
3,0,1,NA,NA,NA,NA
4,0,1,NA,NA,NA,NA
```
# Acknowledgement
This work was supported by the ESF in “Science without borders” project, reg. nr. CZ.02.2.69/0.0/0.0/16_027/0008463 within the Operational Programme Research, Development and Education.
STORAGE_TYPE="CSV"
STORAGE_ADDRESS="../data/"
APPLICATION_NAME="kursawe/v1/test"
ITERATION_COUNTER="0"
DOE_NAME="dmax"
NUMBER_CONFIGURATIONS_PER_ITERATION="50"
NUMBER_OBSERVATIONS_PER_CONFIGURATION="1"
MAX_NUMBER_ITERATION="10"
MINIMUM_DISTANCE="0.2"
DOE_LIMITS="x + y > 1; y <= 0.5"
......@@ -17,18 +17,20 @@ create_doe <- function(knobs_config_list, doe_options, knobs_names, model_contai
discarded_designs <- discarded_designs %>% setdiff(full_design)
}
if(doe_options$nobs >= nrow(full_design))
if(doe_options$nobs >= nrow(full_design)){
algorithm <- "full_factorial"
}
# Write model table -------------------------------------------------------
if (storage_type == "CASSANDRA"){
if(storage_type == "CASSANDRA"){
for (row.ind in 1:nrow(full_design))
{
set_columns <- str_c(c(knobs_names, str_c(metric_names, "_avg"), str_c(metric_names, "_std")), collapse = ", ")
set_values <- str_c(c(full_design[row.ind, ], rep(NA, length(metric_names)*2)), collapse = ", ")
dbSendUpdate(conn, str_c("INSERT INTO ", model_container_name, "(", set_columns, ") VALUES (", set_values, ")"))
}
} else if (storage_type == "CSV"){
} else if(storage_type == "CSV"){
temp_df <- full_design
for(metric_name in metric_names){
metric_avg <- str_c(metric_name, "_avg")
......
# Plugin hth
# Plugin HTH (Highway to Heel)
## Description
......@@ -22,6 +22,6 @@ install.packages('rlang')
install.packages('RJDBC')
```
## Acknowledgement
# Acknowledgement
This work was supported by the ESF in “Science without borders” project, reg. nr. CZ.02.2.69/0.0/0.0/16_027/0008463 within the Operational Programme Research, Development and Education.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment