Skip to content

A model that is trained in any language are able to integrate with tidypredict, and thus with broom. The requirement is that the model in that language is exported using the parse model spec. The easiest file format would be YAML.

python example

A model that was fitted using sklearn’s linear_model. The model is based on diabetes data. Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline. The model’s results were converted to YAML by the same python script, I copied and pasted the top part here:

general:
  is_glm: 0
  model: lm
  residual: 0
  sigma2: 0
  type: regression
  version: 2.0
terms:
- coef: 152.76430691633442
  fields:
  - col: (Intercept)
    type: ordinary
  is_intercept: 1
  label: (Intercept)

Read in R

The YAML data can be read in R by using the yaml package. In this example, we have copy-pasted most of the models inside a variable called sklearn_model. Because yaml requires local YAML variables to be split by line, we use strsplit().

library(yaml)

sklearn_model <- strsplit("general:
  is_glm: 0
  model: lm
  residual: 0
  sigma2: 0
  type: regression
  version: 2.0
terms:
- coef: 152.76430691633442
  fields:
  - col: (Intercept)
    type: ordinary
  is_intercept: 1
  label: (Intercept)
- coef: 0.3034995490660432
  fields:
  - col: age
    type: ordinary
  is_intercept: 0
  label: age
- coef: -237.63931533353403
  fields:
  - col: sex
    type: ordinary
  is_intercept: 0
  label: sex
- coef: 510.5306054362253
  fields:
  - col: bmi
    type: ordinary
  is_intercept: 0
  label: bmi
- coef: 327.7369804093466
  fields:
  - col: bp
    type: ordinary
  is_intercept: 0
  label: bp
- coef: -814.1317093725387
  fields:
  - col: s1
    type: ordinary
  is_intercept: 0
  label: s1
", split = "\n")[[1]]

Now the model is converted to an R list using yaml.load.

sklearn_model <- yaml.load(sklearn_model)

str(sklearn_model, 2)
#> List of 2
#>  $ general:List of 6
#>   ..$ is_glm  : int 0
#>   ..$ model   : chr "lm"
#>   ..$ residual: int 0
#>   ..$ sigma2  : int 0
#>   ..$ type    : chr "regression"
#>   ..$ version : num 2
#>  $ terms  :List of 6
#>   ..$ :List of 4
#>   ..$ :List of 4
#>   ..$ :List of 4
#>   ..$ :List of 4
#>   ..$ :List of 4
#>   ..$ :List of 4

tidypredict

The list object needs to be recognized as a tidypredict parsed model. To do that, we use as_parsed_model()

library(tidypredict)

spm <- as_parsed_model(sklearn_model)

class(spm)
#> [1] "parsed_model"  "pm_regression" "list"

The spm variable now works just as any parsed model inside R. Use tidypredict_fit() to view the resulting formula.

tidypredict_fit(spm)
#> 152.764306916334 + (age * 0.303499549066043) + (sex * -237.639315333534) + 
#>     (bmi * 510.530605436225) + (bp * 327.736980409347) + (s1 * 
#>     -814.131709372539)

Now, the model can run inside a database

tidypredict_sql(spm, dbplyr::simulate_mssql())
#> <SQL> ((((152.764306916334 + (`age` * 0.303499549066043)) + (`sex` * -237.639315333534)) + (`bmi` * 510.530605436225)) + (`bp` * 327.736980409347)) + (`s1` * -814.131709372539)

broom

Now that we have a parsed_model object, it is possible to use broom’s tidy() function. This means that we are able to integrate a totally external model, with broom.

tidy(spm)
#> # A tibble: 6 × 2
#>   term        estimate
#>   <chr>          <dbl>
#> 1 (Intercept)  153.   
#> 2 age            0.303
#> 3 sex         -238.   
#> 4 bmi          511.   
#> 5 bp           328.   
#> 6 s1          -814.