Skip to content

Commit

Permalink
Release v0.13.0 (#688)
Browse files Browse the repository at this point in the history
  • Loading branch information
mllg authored Nov 16, 2021
1 parent 680636c commit 96008d0
Show file tree
Hide file tree
Showing 74 changed files with 307 additions and 213 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: mlr3
Title: Machine Learning in R - Next Generation
Version: 0.12.0-9000
Version: 0.13.0
Authors@R:
c(person(given = "Michel",
family = "Lang",
Expand Down
18 changes: 16 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
# mlr3 0.13.0

* Learners which are capable of resuming/continuing (e.g.,
learner `(classif|regr|surv).xgboost` with hyperparameter `nrounds` updated)
can now optionally store a stack of trained learners to be used to hotstart
their training. Note that this feature is still somewhat experimental.
See `HotstartStack` and #719.
* New measures to score similarity of selected feature sets:
`sim.jaccard` (Jaccard Index) and `sim.phi` (Phi coefficient) (#690).
* `predict_newdata()` now also supports `DataBackend` as input.
* New function `install_pkgs()` to install required packages. This generic works
for all objects with a `packages` field as well as `ResampleResult` and
`BenchmarkResult`.

`BenchmarkResult` (#728).
* New learner `regr.debug` for debugging.
* New `Task` method `$set_levels()` to control how data with factor columns
is returned, independent of the used `DataBackend`.
* Measures now return `NA` if prerequisite are not met (#699).
This allows to conveniently score your experiments with multiple measures
having different requirements.
* Feature names may no longer contain the special character `%`.

# mlr3 0.12.0

Expand Down
2 changes: 1 addition & 1 deletion R/DataBackend.R
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ DataBackend = R6Class("DataBackend", cloneable = FALSE,
#' [DataBackendDataTable] or [DataBackendMatrix], or via the S3 method
#' [as_data_backend()].
#'
#' @param data (`any`)\cr
#' @param data (any)\cr
#' The format of the input data depends on the specialization. E.g.,
#' [DataBackendDataTable] expects a [data.table::data.table()] and
#' [DataBackendMatrix] expects a [Matrix::Matrix()] from \CRANpkg{Matrix}.
Expand Down
29 changes: 19 additions & 10 deletions R/HotstartStack.R
Original file line number Diff line number Diff line change
@@ -1,21 +1,30 @@
#' @title Stack for Hot Start Learners
#'
#' @description
#' This class stores learners for hot starting. When fitting a learner
#' repeatedly on the same task but with a different fidelity, hot starting
#' accelerates model fitting by reusing previously fitted models. For example,
#' add more trees to a fitted random forest model.
#' This class stores learners for hot starting training, i.e. resuming or
#' continuing from an already fitted model.
#' We assume that hot starting is only possible if a single hyperparameter
#' (also called the fidelity parameter, usually controlling the complexity or
#' expensiveness) is altered and all other hyperparameters are identical.
#'
#' The `HotstartStack` stores trained learners which can be potentially used to
#' hot start a learner. Learner automatically hot start while training if a
#' stack is attached to the `$hotstart_stack` field and the stack contains a
#' suitable learner (see examples).
#' suitable learner.
#'
#' For example, if you want to train a random forest learner with 1000 trees but
#' already have a random forest learner with 500 trees (hot start learner),
#' you can add the hot start learner to the `HotstartStack` of the expensive learner
#' with 1000 trees. If you now call the `train()` method (or [resample()] or
#' [benchmark()]), a random forest with 500 trees will be fitted and combined
#' with the 500 trees of the hotstart learner, effectively saving you to
#' fit 500 trees.
#'
#' Hot starting is only supported by learners which have the property
#' `"hotstart_forward"` or `"hotstart_backward"`. For example, an xgboost model
#' can hot start forward by adding more boosting iterations and a random forest
#' can go backwards by removing trees. The fidelity parameters are tagged with
#' `"hotstart"` in learner's parameter set.
#' `"hotstart_forward"` or `"hotstart_backward"`. For example, an `xgboost` model
#' (in \CRANpkg{mlr3learners}) can hot start forward by adding more boosting
#' iterations, and a random forest can go backwards by removing trees.
#' The fidelity parameters are tagged with `"hotstart"` in learner's parameter set.
#'
#' @export
#' @examples
Expand Down Expand Up @@ -118,7 +127,7 @@ HotstartStack = R6Class("HotstartStack",
#' Printer.
#'
#' @param ... (ignored).
print = function() {
print = function(...) {
catf(format(self))
print(self$stack, digits = 2)
}
Expand Down
4 changes: 2 additions & 2 deletions R/Learner.R
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ Learner = R6Class("Learner",
#' @description
#' Printer.
#' @param ... (ignored).
print = function() {
print = function(...) {
catn(format(self))
catn(str_indent("* Model:", if (is.null(self$model)) "-" else class(self$model)[1L]))
catn(str_indent("* Parameters:", as_short_string(self$param_set$values, 1000L)))
Expand Down Expand Up @@ -371,7 +371,7 @@ Learner = R6Class("Learner",
),

active = list(
#' @field model (`any`)\cr
#' @field model (any)\cr
#' The fitted model. Only available after `$train()` has been called.
model = function(rhs) {
assert_ro_binding(rhs)
Expand Down
9 changes: 6 additions & 3 deletions R/Measure.R
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,16 @@ Measure = R6Class("Measure",

if (!is_scalar_na(task_type)) {
assert_choice(task_type, mlr_reflections$task_types$type)
assert_subset(properties, mlr_reflections$measure_properties[[task_type]])
assert_choice(predict_type, names(mlr_reflections$learner_predict_types[[task_type]]))
assert_subset(properties, mlr_reflections$measure_properties[[task_type]])
assert_subset(task_properties, mlr_reflections$task_properties[[task_type]])
}
self$properties = properties

self$properties = unique(properties)
self$predict_type = predict_type
self$predict_sets = assert_subset(predict_sets, mlr_reflections$predict_sets, empty.ok = FALSE)
self$task_properties = assert_subset(task_properties, mlr_reflections$task_properties[[task_type]])
self$task_properties = task_properties
self$packages = union("mlr3", assert_character(packages, any.missing = FALSE, min.chars = 1L))
self$man = assert_string(man, na.ok = TRUE)

Expand All @@ -126,7 +129,7 @@ Measure = R6Class("Measure",
#' @description
#' Printer.
#' @param ... (ignored).
print = function() {
print = function(...) {
catn(format(self))
catn(str_indent("* Packages:", self$packages))
catn(str_indent("* Range:", sprintf("[%g, %g]", self$range[1L], self$range[2L])))
Expand Down
18 changes: 15 additions & 3 deletions R/MeasureSimilarity.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@
#' @description
#' This measure specializes [Measure] for measures quantifying the similarity of
#' sets of selected features.
#' To calculate similarity measures, the [Learner] must have the property
#' `"selected_features"`.
#'
#' * `task_type` is set to `NA_character_`.
#' * `average` is set to `"custom"`.
#'
#' Predefined measures can be found in the [dictionary][mlr3misc::Dictionary] [mlr_measures].
#' The default measure for regression is [`regr.mse`][mlr_measures_regr.mse].
#' Predefined measures can be found in the [dictionary][mlr3misc::Dictionary]
#' [mlr_measures], prefixed with `"sim."`.
#'
#' @template param_id
#' @template param_param_set
Expand All @@ -27,14 +29,24 @@
#'
#' @template seealso_measure
#' @export
#' @examples
#' task = tsk("penguins")
#' learners = list(
#' lrn("classif.rpart", maxdepth = 1, id = "r1"),
#' lrn("classif.rpart", maxdepth = 2, id = "r2")
#' )
#' resampling = rsmp("cv", folds = 3)
#' grid = benchmark_grid(task, learners, resampling)
#' bmr = benchmark(grid, store_models = TRUE)
#' bmr$aggregate(msrs(c("classif.ce", "sim.jaccard")))
MeasureSimilarity = R6Class("MeasureSimilarity", inherit = Measure, cloneable = FALSE,
public = list(
#' @description
#' Creates a new instance of this [R6][R6::R6Class] class.
initialize = function(id, param_set = ps(), range, minimize = NA, average = "macro", aggregator = NULL, properties = character(), predict_type = "response",
predict_sets = "test", task_properties = character(), packages = character(), man = NA_character_) {
super$initialize(id, task_type = NA_character_, param_set = param_set, range = range, minimize = minimize, average = "custom", aggregator = aggregator,
properties = properties, predict_type = predict_type, predict_sets = predict_sets,
properties = c("requires_model", properties), predict_type = predict_type, predict_sets = predict_sets,
task_properties = task_properties, packages = packages, man = man)
}
),
Expand Down
2 changes: 1 addition & 1 deletion R/Prediction.R
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Prediction = R6Class("Prediction",
self$data$row_ids
},

#' @field truth (`any`)\cr
#' @field truth (any)\cr
#' True (observed) outcome.
truth = function(rhs) {
assert_ro_binding(rhs)
Expand Down
2 changes: 1 addition & 1 deletion R/ResampleResult.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ ResampleResult = R6Class("ResampleResult",
#' @description
#' Printer.
#' @param ... (ignored).
print = function() {
print = function(...) {
catf("%s of %i iterations", format(self), self$iters)
catn(str_indent("* Task:", self$task$id))
catn(str_indent("* Learner:", self$learner$id))
Expand Down
2 changes: 1 addition & 1 deletion R/Resampling.R
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Resampling = R6Class("Resampling",
#' @template field_param_set
param_set = NULL,

#' @field instance (`any`)\cr
#' @field instance (any)\cr
#' During `instantiate()`, the instance is stored in this slot in an arbitrary format.
#' Note that if a grouping variable is present in the [Task], a [Resampling] may operate on the
#' group ids internally instead of the row ids (which may lead to confusion).
Expand Down
12 changes: 7 additions & 5 deletions R/Task.R
Original file line number Diff line number Diff line change
Expand Up @@ -466,13 +466,15 @@ Task = R6Class("Task",
assert_set_equal(self$row_ids, data$rownames)
}

# update col_info for existing columns
ci = col_info(data)
ci$label = NA_character_
ci$fix_factor_levels = FALSE

# update col info
self$col_info = ujoin(self$col_info, ci, key = "id")
self$col_info = rbindlist(list(self$col_info, ci[!list(self$col_info), on = "id"]), use.names = TRUE, fill = TRUE)

# add rows to col_info for new columns
self$col_info = rbindlist(list(
self$col_info,
insert_named(ci[!list(self$col_info), on = "id"], list(label = NA_character_, fix_factor_levels = FALSE))
), use.names = TRUE)
setkeyv(self$col_info, "id")

# add new features
Expand Down
4 changes: 2 additions & 2 deletions R/as_data_backend.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#' Additional methods are implemented in the package \CRANpkg{mlr3db}, e.g. to connect
#' to real DBMS like PostgreSQL (via \CRANpkg{dbplyr}) or DuckDB (via \CRANpkg{DBI}/\CRANpkg{duckdb}).
#'
#' @param data `any`\cr
#' @param data (any)\cr
#' Data to create a [DataBackend] from.
#' For a `data.frame()` (this includes `tibble()` from \CRANpkg{tibble} and [data.table::data.table()]),
#' a [DataBackendDataTable] is created.
Expand All @@ -17,7 +17,7 @@
#'
#' @template param_primary_key
#'
#' @param ... (`any`)\cr
#' @param ... (any)\cr
#' Additional arguments passed to the respective [DataBackend] method.
#'
#' @return [DataBackend].
Expand Down
4 changes: 2 additions & 2 deletions R/as_resample_result.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
#' @description
#' Convert object to a [ResampleResult].
#'
#' @param x (`any`)\cr
#' @param x (any)\cr
#' Object to convert.
#' @param ... (`any`)\cr
#' @param ... (any)\cr
#' Currently not used.
#'
#' @return ([ResampleResult]).
Expand Down
16 changes: 8 additions & 8 deletions R/as_task.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,37 @@
#' @description
#' Convert object to a [Task] or a list of [Task].
#'
#' @param x (`any`)\cr
#' @param x (any)\cr
#' Object to convert.
#' @param ... (`any`)\cr
#' @param ... (any)\cr
#' Additional arguments.
#' @param clone (`logical(1)`)\cr
#' If `TRUE`, ensures that the returned object is not the same as the input `x`.
#' @export
as_task = function(x, ...) {
UseMethod("as_task")
}

#' @export
#' @rdname as_task
#' @export
as_task.Task = function(x, clone = FALSE, ...) { # nolint
if (clone) x$clone() else x
}

#' @export
#' @rdname as_task
#' @export
as_tasks = function(x, ...) {
UseMethod("as_tasks")
}

#' @export
#' @rdname as_task
#' @param clone (`logical(1)`)\cr
#' If `TRUE`, ensures that the returned object is not the same as the input `x`.
#' @export
as_tasks.list = function(x, clone = FALSE, ...) { # nolint
lapply(x, as_task, clone = clone, ...)
}

#' @export
#' @rdname as_task
#' @export
as_tasks.Task = function(x, clone = FALSE, ...) { # nolint
list(if (clone) x$clone() else x)
}
2 changes: 1 addition & 1 deletion R/auto_convert.R
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ rm(ee)
#'
#' All rules are stored as functions in [mlr_reflections$auto_converters][mlr_reflections].
#'
#' @param value (`any`)\cr
#' @param value (any)\cr
#' New values to convert in order to match `type`.
#' @param id (`character(1)`)\cr
#' Name of the column, used in error messages.
Expand Down
2 changes: 1 addition & 1 deletion R/install_pkgs.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#'
#' @param x (any)\cr
#' Object with package information (or a list of such objects).
#' @param ... \cr
#' @param ... (any)\cr
#' Additional arguments passed down to [remotes::install_cran()] or
#' [remotes::install_github()].
#' Arguments `force` and `upgrade` are often important in this context.
Expand Down
2 changes: 1 addition & 1 deletion R/mlr_reflections.R
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ local({


### Measures
tmp = c("na_score", "requires_task", "requires_learner", "requires_train_set")
tmp = c("na_score", "requires_task", "requires_learner", "requires_model", "requires_train_set")
mlr_reflections$measure_properties = list(
classif = tmp,
regr = tmp
Expand Down
2 changes: 1 addition & 1 deletion R/predict.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
#' Set to `<Prediction>` to retrieve the complete [Prediction] object.
#' If set to `NULL` (default), the first predict type for the respective class of the [Learner]
#' as stored in [mlr_reflections] is used.
#' @param ... (`any`)\cr
#' @param ... (any)\cr
#' Hyperparameters to pass down to the [Learner].
#'
#' @export
Expand Down
2 changes: 1 addition & 1 deletion R/set_threads.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#' via the [future::plan] [future::multicore]. For this reason all learners connected to \CRANpkg{mlr3}
#' have threading disabled in their defaults.
#'
#' @param x (`any`)\cr
#' @param x (any)\cr
#' Object to set threads for, e.g. a [Learner].
#' This object is modified in-place.
#' @param n (`integer(1)`)\cr
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Successor of [mlr](https://github.com/mlr-org/mlr).
- [useR2019 talk on mlr3pipelines and mlr3tuning](https://www.youtube.com/watch?v=gEW5RxkbQuQ)
- [useR2020 tutorial on mlr3, mlr3tuning and mlr3pipelines](https://www.youtube.com/watch?v=T43hO2o_nZw)
* **Courses/Lectures**
- The course [Introduction to Machine learning (I2ML)](https://compstat-lmu.github.io/lecture_i2ml/) is a free and open flipped classroom course on the basics of machine learning. `mlr3` is used in the [demos](https://github.com/compstat-lmu/lecture_i2ml/tree/master/code-demos-pdf) and [exercises](https://github.com/compstat-lmu/lecture_i2ml/tree/master/exercises).
- The course [Introduction to Machine learning (I2ML)](https://introduction-to-machine-learning.netlify.app/) is a free and open flipped classroom course on the basics of machine learning. `mlr3` is used in the [demos](https://github.com/slds-lmu/lecture_i2ml/tree/master/code-demos-pdf) and [exercises](https://github.com/slds-lmu/lecture_i2ml/tree/master/exercises).
* **Templates/Tutorials**
- [mlr3-learndrake](https://github.com/mlr-org/mlr3-learndrake): Shows how to use mlr3 with [drake](https://docs.ropensci.org/drake/) for reproducible ML workflow automation.
* [List of extension packages](https://github.com/mlr-org/mlr3/wiki/Extension-Packages)
Expand Down
Loading

0 comments on commit 96008d0

Please sign in to comment.