Fitting linear mixed-effects models
The lmm
function is similar to the lmer
function in the
lme4 package for
R. The first two arguments for in the R
version are formula
and data
. The principle method for the
Julia
version takes these arguments.
A simple example
The simplest example of a mixed-effects model that we use in the
lme4 package for R is a model fit to
the Dyestuff
data.
> str(Dyestuff)
'data.frame': 30 obs. of 2 variables:
$ Batch: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ...
$ Yield: num 1545 1440 1440 1520 1580 ...
> (fm1 <- lmer(Yield ~ 1|Batch, Dyestuff, REML=FALSE))
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Yield ~ 1 | Batch
Data: Dyestuff
AIC BIC logLik deviance
333.3271 337.5307 -163.6635 327.3271
Random effects:
Groups Name Variance Std.Dev.
Batch (Intercept) 1388 37.26
Residual 2451 49.51
Number of obs: 30, groups: Batch, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 1527.50 17.69 86.33
```@meta DocTestSetup = quote using DataFrames, MixedModels include(Pkg.dir("MixedModels", "test", "data.jl")) end
These `Dyestuff` data are available through `RCall` but to run the doctests we use a stored copy of the dataframe.
```julia
julia> using DataFrames, MixedModels
julia> ds
30×2 DataFrames.DataFrame
│ Row │ Yield │ Batch │
├─────┼────────┼───────┤
│ 1 │ 1545.0 │ 'A' │
│ 2 │ 1440.0 │ 'A' │
│ 3 │ 1440.0 │ 'A' │
│ 4 │ 1520.0 │ 'A' │
│ 5 │ 1580.0 │ 'A' │
│ 6 │ 1540.0 │ 'B' │
│ 7 │ 1555.0 │ 'B' │
│ 8 │ 1490.0 │ 'B' │
⋮
│ 22 │ 1630.0 │ 'E' │
│ 23 │ 1515.0 │ 'E' │
│ 24 │ 1635.0 │ 'E' │
│ 25 │ 1625.0 │ 'E' │
│ 26 │ 1520.0 │ 'F' │
│ 27 │ 1455.0 │ 'F' │
│ 28 │ 1450.0 │ 'F' │
│ 29 │ 1480.0 │ 'F' │
│ 30 │ 1445.0 │ 'F' │
lmm
defaults to maximum likelihood estimation whereas lmer
in R
defaults to REML estimation.
Linear mixed model fit by maximum likelihood
Formula: Yield ~ 1 + (1 | Batch)
logLik -2 logLik AIC BIC
-163.66353 327.32706 333.32706 337.53065
Variance components:
Column Variance Std.Dev.
Batch (Intercept) 1388.3332 37.260344
Residual 2451.2500 49.510100
Number of obs: 30; levels of grouping factors: 6
Fixed-effects parameters:
Estimate Std.Error z value
(Intercept) 1527.5 17.6946 86.326
In general the model should be fit through an explicit call to the fit!
function, which may take a second argument indicating a verbose fit.
julia> fit!(lmm(@formula(Yield ~ 1 + (1 | Batch)), ds), true);
f_1: 327.76702, [1.0]
f_2: 331.03619, [1.75]
f_3: 330.64583, [0.25]
f_4: 327.69511, [0.97619]
f_5: 327.56631, [0.928569]
f_6: 327.3826, [0.833327]
f_7: 327.35315, [0.807188]
f_8: 327.34663, [0.799688]
f_9: 327.341, [0.792188]
f_10: 327.33253, [0.777188]
f_11: 327.32733, [0.747188]
f_12: 327.32862, [0.739688]
f_13: 327.32706, [0.752777]
f_14: 327.32707, [0.753527]
f_15: 327.32706, [0.752584]
f_16: 327.32706, [0.752509]
f_17: 327.32706, [0.752591]
f_18: 327.32706, [0.752581]
The numeric representation of the model has type
julia> typeof(fit!(lmm(@formula(Yield ~ 1 + (1 | Batch)), ds)))
MixedModels.LinearMixedModel{Float64}
Those familiar with the lme4
package for R
will see the usual
suspects.
julia> m = fit!(lmm(@formula(Yield ~ 1 + (1 | Batch)), ds));
julia> fixef(m) # estimates of the fixed-effects parameters
1-element Array{Float64,1}:
1527.5
julia> coef(m) # another name for fixef
1-element Array{Float64,1}:
1527.5
julia> ranef(m)
1-element Array{Array{Float64,2},1}:
[-16.6282 0.369516 … 53.5798 -42.4943]
julia> ranef(m, uscale = true) # on the u scale
1-element Array{Array{Float64,2},1}:
[-22.0949 0.490999 … 71.1948 -56.4648]
julia> deviance(m)
327.3270598811394
julia> objective(m)
327.3270598811394
We prefer objective
to deviance
because the value returned is
-2loglikelihood(m)
, without the correction for the null deviance.
It is not clear how the null deviance should be defined for these models.
More substantial examples
Fitting a model to the Dyestuff
data is trivial. The InstEval
data in the lme4
package is more of a challenge in that there are
nearly 75,000 evaluations by 2972 students on a total of 1128
instructors.
julia> head(inst)
6x7 DataFrames.DataFrame
│ Row │ s │ d │ studage │ lectage │ service │ dept │ y │
┝━━━━━┿━━━━━┿━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━┿━━━┥
│ 1 │ "1" │ "1002" │ "2" │ "2" │ "0" │ "2" │ 5 │
│ 2 │ "1" │ "1050" │ "2" │ "1" │ "1" │ "6" │ 2 │
│ 3 │ "1" │ "1582" │ "2" │ "2" │ "0" │ "2" │ 5 │
│ 4 │ "1" │ "2050" │ "2" │ "2" │ "1" │ "3" │ 3 │
│ 5 │ "2" │ "115" │ "2" │ "1" │ "0" │ "5" │ 2 │
│ 6 │ "2" │ "756" │ "2" │ "1" │ "0" │ "5" │ 4 │
julia> m2 = fit!(lmm(@formula(y ~ 1 + dept*service + (1|s) + (1|d)), inst))
Linear mixed model fit by maximum likelihood
Formula: y ~ 1 + dept * service + (1 | s) + (1 | d)
logLik -2 logLik AIC BIC
-1.18792777×10⁵ 2.37585553×10⁵ 2.37647553×10⁵ 2.37932876×10⁵
Variance components:
Column Variance Std.Dev.
s (Intercept) 0.10541792 0.32468126
d (Intercept) 0.25841626 0.50834659
Residual 1.38472780 1.17674458
Number of obs: 73421; levels of grouping factors: 2972, 1128
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 3.22961 0.064053 50.4209 <1e-99
dept: 5 0.129536 0.101294 1.27882 0.2010
dept: 10 -0.176751 0.0881352 -2.00545 0.0449
dept: 12 0.0517102 0.0817523 0.632523 0.5270
dept: 6 0.0347319 0.0856209 0.405647 0.6850
dept: 7 0.14594 0.0997984 1.46235 0.1436
dept: 4 0.151689 0.0816897 1.85689 0.0633
dept: 8 0.104206 0.118751 0.877517 0.3802
dept: 9 0.0440401 0.0962984 0.45733 0.6474
dept: 14 0.0517546 0.0986029 0.524879 0.5997
dept: 1 0.0466719 0.101942 0.457828 0.6471
dept: 3 0.0563461 0.0977925 0.57618 0.5645
dept: 11 0.0596536 0.100233 0.595151 0.5517
dept: 2 0.00556283 0.110867 0.0501759 0.9600
service: 1 0.252025 0.0686507 3.67112 0.0002
dept: 5 & service: 1 -0.180757 0.123179 -1.46744 0.1423
dept: 10 & service: 1 0.0186492 0.110017 0.169512 0.8654
dept: 12 & service: 1 -0.282269 0.0792937 -3.5598 0.0004
dept: 6 & service: 1 -0.494464 0.0790278 -6.25684 <1e-9
dept: 7 & service: 1 -0.392054 0.110313 -3.55403 0.0004
dept: 4 & service: 1 -0.278547 0.0823727 -3.38154 0.0007
dept: 8 & service: 1 -0.189526 0.111449 -1.70056 0.0890
dept: 9 & service: 1 -0.499868 0.0885423 -5.64553 <1e-7
dept: 14 & service: 1 -0.497162 0.0917162 -5.42065 <1e-7
dept: 1 & service: 1 -0.24042 0.0982071 -2.4481 0.0144
dept: 3 & service: 1 -0.223013 0.0890548 -2.50422 0.0123
dept: 11 & service: 1 -0.516997 0.0809077 -6.38997 <1e-9
dept: 2 & service: 1 -0.384773 0.091843 -4.18946 <1e-4
Models with vector-valued random effects can be fit
julia> slp
180×3 DataFrames.DataFrame
│ Row │ Reaction │ Days │ Subject │
├─────┼──────────┼──────┼─────────┤
│ 1 │ 249.56 │ 0 │ 1 │
│ 2 │ 258.705 │ 1 │ 1 │
│ 3 │ 250.801 │ 2 │ 1 │
│ 4 │ 321.44 │ 3 │ 1 │
│ 5 │ 356.852 │ 4 │ 1 │
│ 6 │ 414.69 │ 5 │ 1 │
│ 7 │ 382.204 │ 6 │ 1 │
│ 8 │ 290.149 │ 7 │ 1 │
⋮
│ 172 │ 273.474 │ 1 │ 18 │
│ 173 │ 297.597 │ 2 │ 18 │
│ 174 │ 310.632 │ 3 │ 18 │
│ 175 │ 287.173 │ 4 │ 18 │
│ 176 │ 329.608 │ 5 │ 18 │
│ 177 │ 334.482 │ 6 │ 18 │
│ 178 │ 343.22 │ 7 │ 18 │
│ 179 │ 369.142 │ 8 │ 18 │
│ 180 │ 364.124 │ 9 │ 18 │
julia> fm3 = fit!(lmm(@formula(Reaction ~ 1 + Days + (1+Days|Subject)), slp))
Linear mixed model fit by maximum likelihood
Formula: Reaction ~ 1 + Days + ((1 + Days) | Subject)
logLik -2 logLik AIC BIC
-875.96967 1751.93934 1763.93934 1783.09709
Variance components:
Column Variance Std.Dev. Corr.
Subject (Intercept) 565.51067 23.780468
Days 32.68212 5.716828 0.08
Residual 654.94145 25.591824
Number of obs: 180; levels of grouping factors: 18
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 251.405 6.63226 37.9064 <1e-99
Days 10.4673 1.50224 6.96781 <1e-11