Merge pull request #238 from automl/development

SMAC3 v0.5.0
automl · May 8, 2017 · 855cfa3 · 855cfa3
2 parents 94e83c9 + bad1ecf
commit 855cfa3
Show file tree

Hide file tree

Showing 54 changed files with 753 additions and 418 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -4,11 +4,11 @@ matrix:
 
   include:
   - os: linux
-    env: PYTHON_VERSION="3.4" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh"
+    env: PYTHON_VERSION="3.4" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
   - os: linux
-    env: PYTHON_VERSION="3.5" COVERAGE="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh"
+    env: PYTHON_VERSION="3.5" COVERAGE="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
   - os: linux
-    env: PYTHON_VERSION="3.6" COVERAGE="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh"
+    env: PYTHON_VERSION="3.6" COVERAGE="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
 
   # Disable OSX building because it takes too long and hinders progress
   # Set language to generic to not break travis-ci
@@ -46,7 +46,7 @@ before_install:
   - conda update --yes conda
   - conda create -n testenv --yes python=$PYTHON_VERSION pip wheel nose
   - source activate testenv
-  - conda install --yes gcc
+  - conda install --yes gcc swig
   - echo "Using GCC at "`which gcc`
   - export CC=`which gcc`
 

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # SMAC v3 Project
 
-Copyright (C) 2016  [ML4AAD Group](http://www.ml4aad.org/)
+Copyright (C) 2017  [ML4AAD Group](http://www.ml4aad.org/)
 
 __Attention__: This package is under heavy development and subject to change. 
 A stable release of SMAC (v2) in Java can be found [here](http://www.cs.ubc.ca/labs/beta/Projects/SMAC/).
@@ -11,15 +11,15 @@ Status for master branch:
 
 [![Build Status](https://travis-ci.org/automl/SMAC3.svg?branch=master)](https://travis-ci.org/automl/SMAC3)
 [![Code Health](https://landscape.io/github/automl/SMAC3/master/landscape.svg?style=flat)](https://landscape.io/github/automl/SMAC3/master)
-[![Coverage Status](https://coveralls.io/repos/automl/auto-sklearn/badge.svg?branch=master&service=github)](https://coveralls.io/github/automl/SMAC3?branch=master)
+[![codecov Status](https://codecov.io/gh/automl/SMAC3/branch/master/graph/badge.svg)](https://codecov.io/gh/automl/SMAC3)
 
 Status for development branch
 
 [![Build Status](https://travis-ci.org/automl/SMAC3.svg?branch=development)](https://travis-ci.org/automl/SMAC3)
 [![Code Health](https://landscape.io/github/automl/SMAC3/development/landscape.svg?style=flat)](https://landscape.io/github/automl/SMAC3/development)
-[![Coverage Status](https://coveralls.io/repos/automl/SMAC3/badge.svg?branch=development&service=github)](https://coveralls.io/github/automl/SMAC3?branch=development)
+[![codecov](https://codecov.io/gh/automl/SMAC3/branch/development/graph/badge.svg)](https://codecov.io/gh/automl/SMAC3)
 
-#OVERVIEW
+# OVERVIEW
 
 SMAC is a tool for algorithm configuration 
 to optimize the parameters of arbitrary algorithms across a set of instances.
@@ -38,7 +38,11 @@ we refer to
 SMAC v3 is written in python3 and continuously tested with python3.4 and python3.5. 
 Its [Random Forest](https://bitbucket.org/aadfreiburg/random_forest_run) is written in C++.
 
-#Installation:
+# Installation
+
+Besides the listed requirements (see `requirements.txt`), the random forest used in SMAC3 requires SWIG.
+
+	apt-get install swig 
 
     cat requirements.txt | xargs -n 1 -L 1 pip install
 

diff --git a/changelog.md b/changelog.md
@@ -1,3 +1,38 @@
+# 0.5
+
+## Major changes
+
+* MAINT #192: SMAC uses version 0.4 of the random forest library pyrfr. As a
+  side-effect, the library [swig](http://www.swig.org/) is necessary to build
+  the random forest.
+* MAINT: random samples which are interleaved in the list of challengers are now
+  obtained from a generator. This reduces the overhead of sampling random
+  configurations.
+* FIX #117: only round the cutoff when running a python function as the target
+  algorithm.
+* MAINT #231: Rename the submodule `smac.smbo` to `smac.optimizer`.
+* MAINT #213: Use log(EI) as default acquisition function when optimizing
+  running time of an algorithm.
+* MAINT #223: updated example of optimizing a random forest with SMAC.
+* MAINT #221: refactored the EPM module. The PCA on instance features is now
+  part of fitting the EPM instead of reading a scenario. Because of this
+  restructuring, the PCA can now take instance features which are external
+  data into account.
+
+## Minor changes
+
+* SMAC now outputs scenario options if the log level is `DEBUG` (2f0ceee).
+* SMAC logs the command line call if invoked from the command line (3accfc2).
+* SMAC explicitly checks that it runs in `python>=3.4`.
+* MAINT #226: improve efficientcy when loading the runhistory from a json file.
+* FIX #217: adds milliseconds to the output directory names to avoid race.
+  conditions when starting multiple runs on a cluster.
+* MAINT #209: adds the seed or a pseudo-seed to the output directory name for
+  better identifiability of the output directories.
+* FIX #216: replace broken call to in EIPS acqusition function.
+* MAINT: use codecov.io instead of coveralls.io.
+* MAINT: increase minimal required version of the ConfigSpace package to 0.3.2.
+
 # 0.4
 
 * ADD #204: SMAC now always saves runhistory files as `runhistory.json`.

diff --git a/ci_scripts/circle_install.sh b/ci_scripts/circle_install.sh
@@ -0,0 +1,19 @@
+#!bin/bash
+
+# on circle ci, each command run with it's own execution context so we have to
+# activate the conda testenv on a per command basis. That's why we put calls to
+# python (conda) in a dedicated bash script and we activate the conda testenv
+# here.
+source activate testenv
+
+# install documentation building dependencies
+pip install --upgrade numpy
+pip install --upgrade matplotlib setuptools nose coverage sphinx pillow sphinx-gallery sphinx_bootstrap_theme cython numpydoc
+# And finally, all other dependencies
+cat requirements.txt | xargs -n 1 -L 1 pip install
+
+python setup.py clean
+python setup.py develop
+
+# pipefail is necessary to propagate exit codes
+set -o pipefail && cd doc && make html 2>&1 | tee ~/log.txt
diff --git a/circle.yml b/circle.yml
@@ -1,44 +1,28 @@
 machine:
   environment:
-    # The github organization or username of the repository which hosts the
-    # project and documentation.
-    USERNAME: "automl"
-
-    # The repository where the documentation will be hosted
-    DOC_REPO: "SMAC3"
-
-    # The base URL for the Github page where the documentation will be hosted
-    DOC_URL: ""
-
-    # The email is to be used for commits in the Github Page
-    EMAIL: "[email protected]"
+    PATH: /home/ubuntu/miniconda/bin:$PATH
 
 dependencies:
 
   # Various dependencies
   pre:
+    # Get rid of existing virtualenvs on circle ci as they conflict with conda.
+    # From nilearn: https://github.com/nilearn/nilearn/blob/master/circle.yml
+    - cd && rm -rf ~/.pyenv && rm -rf ~/virtualenvs
     # from scikit-learn contrib
     - sudo -E apt-get -yq remove texlive-binaries --purge
-    - sudo apt-get update
-    - sudo apt-get install libatlas-dev libatlas3gf-base
-    - sudo apt-get install build-essential python-dev python-setuptools
-    # install numpy first as it is a compile time dependency for other packages
-    - pip install --upgrade numpy
-    # install documentation building dependencies
-    - pip install --upgrade matplotlib setuptools nose coverage sphinx pillow sphinx-gallery sphinx_bootstrap_theme cython numpydoc
-    # Installing required packages for `make -C doc check command` to work.
     - sudo -E apt-get -yq update
     - sudo -E apt-get -yq --no-install-suggests --no-install-recommends --force-yes install dvipng texlive-latex-base texlive-latex-extra
-    # Installing packages to build the random forest
-    # finally install the requirements of the package to allow autodoc
-    - pip install -r requirements.txt
+    # Conda installation
+    - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
+    - bash ~/miniconda.sh -b -p $HOME/miniconda
+    - conda update --yes conda
+    - conda create -n testenv --yes python=3.6 pip wheel nose gcc swig
 
   # The --user is needed to let sphinx see the source and the binaries
   # The pipefail is requested to propagate exit code
   override:
-    - python setup.py clean
-    - python setup.py develop
-    - set -o pipefail && cd doc && make html 2>&1 | tee ~/log.txt
+    - source ci_scripts/circle_install.sh
 test:
   # Grep error on the documentation
   override:
@@ -58,7 +42,7 @@ general:
     - "doc/_build/html"
     - "~/log.txt"
   # Restric the build to the branch master only
-  branches:
-    only:
-       - development
-       - master
+  #branches:
+  #  only:
+  #     - development
+  #     - master
diff --git a/examples/rf.py b/examples/rf.py
@@ -3,8 +3,10 @@
 import inspect
 
 import numpy as np
-from sklearn.model_selection import KFold
+from sklearn.metrics import make_scorer
+from sklearn.model_selection import cross_val_score
 from sklearn.ensemble import RandomForestRegressor
+from sklearn.datasets import load_boston
 
 from smac.configspace import ConfigurationSpace
 from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
@@ -14,7 +16,9 @@
 from smac.scenario.scenario import Scenario
 from smac.facade.smac_facade import SMAC
 
-def rfr(cfg, seed):
+boston = load_boston()
+
+def rf_from_cfg(cfg, seed):
     """
         Creates a random forest regressor from sklearn and fits the given data on it.
         This is the function-call we try to optimize. Chosen values are stored in
@@ -36,7 +40,6 @@ def rfr(cfg, seed):
     rfr = RandomForestRegressor(
         n_estimators=cfg["num_trees"],
         criterion=cfg["criterion"],
-        max_depth=cfg["max_depth"],
         min_samples_split=cfg["min_samples_to_split"],
         min_samples_leaf=cfg["min_samples_in_leaf"],
         min_weight_fraction_leaf=cfg["min_weight_frac_leaf"],
@@ -45,36 +48,19 @@ def rfr(cfg, seed):
         bootstrap=cfg["do_bootstrapping"],
         random_state=seed)
 
-    rmses = []
-    for train, test in kf:
-        # We iterate over cv-folds
-        X_train, X_test = X[train], X[test]
-        y_train, y_test = y[train], y[test]
-
-        rfr.fit(X_train, y_train)
-
-        y_pred = rfr.predict(X_test)
-
-        # We use root mean square error as performance measure
-        rmse = np.sqrt(np.mean((y_pred - y_test)**2))
-        rmses.append(rmse)
-    return np.mean(rmses)
+    def rmse(y, y_pred):
+        return np.sqrt(np.mean((y_pred - y)**2))
+    # Creating root mean square error for sklearns crossvalidation
+    rmse_scorer = make_scorer(rmse, greater_is_better=False)
+    score = cross_val_score(rfr, boston.data, boston.target, cv=11, scoring=rmse_scorer)
+    return -1 * np.mean(score)  # Because cross_validation sign-flips the score
 
 
 logger = logging.getLogger("RF-example")
 logging.basicConfig(level=logging.INFO)
 #logging.basicConfig(level=logging.DEBUG)  # Enable to show debug-output
-
-folder = os.path.realpath(
-    os.path.abspath(os.path.split(inspect.getfile(inspect.currentframe()))[0]))
-
-# Load data
-X = np.array(np.loadtxt(os.path.join(folder, "data/X.csv")), dtype=np.float32)
-y = np.array(np.loadtxt(os.path.join(folder, "data/y.csv")), dtype=np.float32)
-
-# Create cross-validation folds
-kf = KFold(n_splits=4, shuffle=True, random_state=42)
-kf = kf.split(X, y)
+logger.info("Running random forest example for SMAC. If you experience "
+            "difficulties, try to decrease the memory-limit.")
 
 # Build Configuration Space which defines all parameters and their ranges.
 # To illustrate different parameter types,
@@ -88,28 +74,27 @@ def rfr(cfg, seed):
 
 # Or we can add multiple hyperparameters at once:
 num_trees = UniformIntegerHyperparameter("num_trees", 10, 50, default=10)
-max_depth = UniformIntegerHyperparameter("max_depth", 20, 30, default=20)
-max_features = UniformIntegerHyperparameter("max_features", 1, X.shape[1], default=1)
+max_features = UniformIntegerHyperparameter("max_features", 1, boston.data.shape[1], default=1)
 min_weight_frac_leaf = UniformFloatHyperparameter("min_weight_frac_leaf", 0.0, 0.5, default=0.0)
 criterion = CategoricalHyperparameter("criterion", ["mse", "mae"], default="mse")
 min_samples_to_split = UniformIntegerHyperparameter("min_samples_to_split", 2, 20, default=2)
 min_samples_in_leaf = UniformIntegerHyperparameter("min_samples_in_leaf", 1, 20, default=1)
 max_leaf_nodes = UniformIntegerHyperparameter("max_leaf_nodes", 10, 1000, default=100)
 
-cs.add_hyperparameters([num_trees, max_depth, min_weight_frac_leaf, criterion,
+cs.add_hyperparameters([num_trees, min_weight_frac_leaf, criterion,
         max_features, min_samples_to_split, min_samples_in_leaf, max_leaf_nodes])
 
 # SMAC scenario oject
-scenario = Scenario({"run_obj": "quality",  # we optimize quality (alternative runtime)
-                     "runcount-limit": 20,  # maximum number of function evaluations
-                     "cs": cs,              # configuration space
+scenario = Scenario({"run_obj": "quality",   # we optimize quality (alternative runtime)
+                     "runcount-limit": 50,  # maximum number of function evaluations
+                     "cs": cs,               # configuration space
                      "deterministic": "true",
-                     "memory_limit": 1024,
+                     "memory_limit": 3072,   # adapt this to reasonable value for your hardware
                      })
 
 # To optimize, we pass the function to the SMAC-object
 smac = SMAC(scenario=scenario, rng=np.random.RandomState(42),
-            tae_runner=rfr)
+            tae_runner=rf_from_cfg)
 
 # Example call of the function with default values
 # It returns: Status, Cost, Runtime, Additional Infos

diff --git a/requirements.txt b/requirements.txt
@@ -2,10 +2,9 @@ setuptools
 numpy>=1.7.1
 scipy>=0.18.1
 six
-Cython
 psutil
 pynisher>=0.4.1
-ConfigSpace>=0.3.1
-pyrfr==0.2.0
+ConfigSpace>=0.3.2
 scikit-learn
-typing
+typing
+pyrfr>=0.4.0
diff --git a/smac/__init__.py b/smac/__init__.py
@@ -1,3 +1,8 @@
+import sys
+
+if sys.version_info < (3,4):
+    raise ValueError("SMAC requires Python 3.4 or newer.")
+
 from smac.__version__ import __version__
 AUTHORS = "Marius Lindauer, Matthias Feurer, Katharina Eggensperger, " \
           "Aaron Klein, Stefan Falkner and Frank Hutter"
diff --git a/smac/__version__.py b/smac/__version__.py
@@ -1,4 +1,4 @@
 """Version information."""
 
 # The following line *must* be the last in the module, exactly as formatted:
-__version__ = "0.4.0"
+__version__ = "0.5.0"