Overview and Introduction to Hierarchical and Mixed Models

  • Why Hierarchical models?
    • Data nested within itself and observations not truly independent
    • pool information across small sample sizes
    • repeated observations across groups or individuals
  • estimating intercepts for each of the different classes/groups
      • # Change classid to be a factor school_3_data <- school_3_data %>% mutate(classid = factor(classid)) # Calculate the mean of mathgain for each class school_3_data %>% group_by(classid) %>% summarize(n(), mathgain_class = mean(mathgain)) # Estimate an intercept for each class lm(mathgain ~ classid - 1)
         
  • Slope and Multiple regression
    • # Build a multiple regression with interaction lm(mathgain ~ classid * mathkind -1 , data = school_3_data)
       
  • Random Effect
    • Linear models in R estimate parameters that are considered fixed or non-random and are called fixed-effects. In contrast, random-effect parameters assume data share a common error distribution, and can produce different estimates when there are small amounts of data or outliers. Models with both fixed and random-effects are mixed-effect models or linear mixed-effect regression.
    • # Build a liner model including class as fixed-effect model lm_out <- lm(mathgain ~ classid + mathkind, data = student_data) # Build a mixed-effect model including class id as a random-effect lmer_out <- lmer(mathgain ~ mathkind + (1 | classid), data = student_data) # Extract out the slope estimate for mathkind tidy(lm_out) %>% filter(term == "mathkind") tidy(lmer_out) %>% filter(term == "mathkind")

      random effect intercepts (one slope)

      Figure: Random Effect model - intercepts
      notion image
      solid lines: fixed effect model
      dashed lines: mixed effect model
      random-effect models express more variability in the data
       

      Random Effect slopes

      With lme4 syntax, lmer() uses (countinuous_predictor | random_effect_group) for a random-effect slope. When lme4 estimates a random-effect slope, it also estimates a random-effect intercept. scale() rescaled the predictor variable mathkind to make the model more numerically stable. Without this change, lmer() cannot fit the model.
      lmer_intercept <- lmer(mathgain ~ mathkind_scaled + (1 | classid), data = student_data) lmer_slope <- lmer(mathgain ~ (mathkind_scaled | classid), data = student_data)
      notion image
      The model with fixed-effect slopes has parallel lines (solid lines) because the slope estimates are the same. The model with random-effect slopes (dashed lines) does not have parallel lines because the slope estimates are different. The model with random-effect slopes (dashed lines) has lines that are shallower than the other model. This occurred because slopes are being estimated for each classroom, but include a shared distribution. This shared distribution pools information from all classrooms (including those not shown on the plot).
       
  • Interpreting model coefs
    • coefficient estimates include uncertainty and 95% confidence interval (CI) captures this uncertainty. If a parameter's 95% confidence interval does not include zero, then the parameter is likely statistically different from zero.
    •  
# Extract coefficents lmer_coef <- tidy(lmer_classroom, conf.int = TRUE) # Plot results lmer_coef %>% filter(effect == "fixed" & term != "(Intercept)") %>% ggplot(., aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, color = 'red') + geom_point() + geom_linerange() + coord_flip() + theme_bw() + ylab("Coefficient estimate and 95% CI") + xlab("Regression coefficient")
notion image