| Function | Works |
|---|---|
tidypredict_fit(), tidypredict_sql(),
parse_model()
|
✔ |
tidypredict_to_column() |
✗ |
tidypredict_test() |
✗ |
tidypredict_interval(),
tidypredict_sql_interval()
|
✗ |
parsnip |
✔ |
How it works
Here is a simple randomForest() model using the
iris dataset:
library(dplyr)
library(tidypredict)
library(randomForest)
model <- randomForest(Species ~ ., data = iris, ntree = 100, proximity = TRUE)Under the hood
The parser is based on the output from the
randomForest::getTree() function. It will return as many
decision paths as there are non-NA rows in the prediction
field.
getTree(model, labelVar = TRUE) %>%
head()
#> left daughter right daughter split var split point status
#> 1 2 3 Petal.Length 2.50 1
#> 2 0 0 <NA> 0.00 -1
#> 3 4 5 Petal.Length 5.05 1
#> 4 6 7 Petal.Width 1.90 1
#> 5 0 0 <NA> 0.00 -1
#> 6 8 9 Sepal.Length 4.95 1
#> prediction
#> 1 <NA>
#> 2 setosa
#> 3 <NA>
#> 4 <NA>
#> 5 virginica
#> 6 <NA>The output from parse_model() is transformed into a
dplyr, a.k.a Tidy Eval, formula. The entire decision tree
becomes one dplyr::case_when() statement
tidypredict_fit(model)[1]
#> `/`()From there, the Tidy Eval formula can be used anywhere where it can
be operated. tidypredict provides three paths:
- Use directly inside
dplyr,mutate(iris, !! tidypredict_fit(model)) - Use
tidypredict_to_column(model)to a piped command set - Use
tidypredict_to_sql(model)to retrieve the SQL statement
parsnip
tidypredict also supports randomForest
model objects fitted via the parsnip package.
library(parsnip)
parsnip_model <- rand_forest(mode = "classification") %>%
set_engine("randomForest") %>%
fit(Species ~ ., data = iris)
tidypredict_fit(parsnip_model)[[1]]
#> `/`