- Why Hierarchical models?
- Data nested within itself and observations not truly independent
- pool information across small sample sizes
- repeated observations across groups or individuals
- estimating intercepts for each of the different classes/groups
# Change classid to be a factor school_3_data <- school_3_data %>% mutate(classid = factor(classid)) # Calculate the mean of mathgain for each class school_3_data %>% group_by(classid) %>% summarize(n(), mathgain_class = mean(mathgain)) # Estimate an intercept for each class lm(mathgain ~ classid - 1)
- Slope and Multiple regression
# Build a multiple regression with interaction lm(mathgain ~ classid * mathkind -1 , data = school_3_data)
- Random Effect
- Linear models in R estimate parameters that are considered fixed or non-random and are called fixed-effects. In contrast, random-effect parameters assume data share a common error distribution, and can produce different estimates when there are small amounts of data or outliers. Models with both fixed and random-effects are mixed-effect models or linear mixed-effect regression.
# Build a liner model including class as fixed-effect model lm_out <- lm(mathgain ~ classid + mathkind, data = student_data) # Build a mixed-effect model including class id as a random-effect lmer_out <- lmer(mathgain ~ mathkind + (1 | classid), data = student_data) # Extract out the slope estimate for mathkind tidy(lm_out) %>% filter(term == "mathkind") tidy(lmer_out) %>% filter(term == "mathkind")
random effect intercepts (one slope)
Figure: Random Effect model - intercepts
solid lines: fixed effect model
dashed lines: mixed effect model
random-effect models express more variability in the data
Random Effect slopes
With
lme4
syntax, lmer()
uses (countinuous_predictor | random_effect_group)
for a random-effect slope. When lme4
estimates a random-effect slope, it also estimates a random-effect intercept. scale()
rescaled the predictor variable mathkind
to make the model more numerically stable. Without this change, lmer()
cannot fit the model.lmer_intercept <- lmer(mathgain ~ mathkind_scaled + (1 | classid), data = student_data) lmer_slope <- lmer(mathgain ~ (mathkind_scaled | classid), data = student_data)
The model with fixed-effect slopes has parallel lines (solid lines) because the slope estimates are the same. The model with random-effect slopes (dashed lines) does not have parallel lines because the slope estimates are different. The model with random-effect slopes (dashed lines) has lines that are shallower than the other model. This occurred because slopes are being estimated for each classroom, but include a shared distribution. This shared distribution pools information from all classrooms (including those not shown on the plot).
- Interpreting model coefs
- coefficient estimates include uncertainty and 95% confidence interval (CI) captures this uncertainty. If a parameter's 95% confidence interval does not include zero, then the parameter is likely statistically different from zero.
# Extract coefficents lmer_coef <- tidy(lmer_classroom, conf.int = TRUE) # Plot results lmer_coef %>% filter(effect == "fixed" & term != "(Intercept)") %>% ggplot(., aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, color = 'red') + geom_point() + geom_linerange() + coord_flip() + theme_bw() + ylab("Coefficient estimate and 95% CI") + xlab("Regression coefficient")