Day 23 Randomized complete block designs

July 14th, 2025

23.1 Announcements

HW 4 due Friday
Topics for the week of July 21 - survey

23.2 Randomized complete block designs

Blocks are groups of approximately similar experimental units.
Data generated by blocked designs may be analyzed as \(y_{ij} = \mu +\tau_i + b_j + \varepsilon_{ij}\), where \(y_{ij}\) is the observed response for the \(i\)th treatment in the \(j\)th block, \(\tau_i\) is the effect of the \(i\)th treatment (\(i = 1, 2, ..., t\)), \(b_j\) is the effect of the \(j\)th block, \(\varepsilon_{ij}\) is the residual of the \(i\)th treatment in the \(j\)th block.
Most RCBDs have \(k = t\), where \(k\) is the number of experimental units per block, and \(t\) is the number of treatments.

23.3 Blocks - fixed or random?

The assumptions behind \(b_j\) vary:
Fundamentals of mixed models
- Assumption behind blocks as fixed effects: there is a ‘true’ block effect out there.
- Assumption behind blocks as random effects:
Confidence intervals of the means differ depending on the model:
- Blocks as fixed: \(\hat\mu \pm t \cdot \sqrt{\frac{\sigma_{\varepsilon}^2 }{b}}\)
- Blocks as random: \(\hat\mu \pm t \cdot \sqrt{\frac{\sigma_{\varepsilon}^2 + \sigma^2_{b}}{b}}\)
Confidence intervals of the means differences don’t differ depending on the model:
- Blocks as fixed/as random: \(\hat\mu \pm t \cdot \sqrt{\frac{2 \sigma_{\varepsilon}^2 }{b}}\)
More on the differences between blocks as fixed vs random: Dixon 2016.

23.3.1 Applied design and analysis of an RCBD

An experiment was run in South Australia to compare for 107 different wheat varieties, in a randomized complete block design with 3 blocks.

Get R code here.

Table 23.1: ANOVA table for the RCBD
Source	df
Blocks	b-1 = 2
Genotype	g-1 = 106
Error	N-b-g+1 = 221
Total	N-1 = 329

library(tidyverse)
library(agridat)
library(lme4)
library(emmeans)

dat_wheat_rcbd <- agridat::gilmour.serpentine

dat_wheat_rcbd %>% 
  ggplot(aes(col, row))+
  geom_tile(aes(fill = rep))

m_block.as.fixed <- lm(yield ~ gen + rep, data = dat_wheat_rcbd)

m_block.as.random <- lme4::lmer(yield ~ gen + (1|rep), data = dat_wheat_rcbd)

emmeans(m_block.as.fixed, ~gen)[1:10,]

##  gen          emmean   SE  df lower.CL upper.CL
##  (WWH*MM)*WR*    709 66.7 221      577      841
##  (WqKPWmH*3Ag    733 66.7 221      602      865
##  AMERY           616 66.7 221      484      747
##  ANGAS           576 66.7 221      445      708
##  AROONA          555 66.7 221      424      687
##  BATAVIA         534 66.7 221      402      665
##  BD231           639 66.7 221      507      770
##  BEULAH          535 66.7 221      404      667
##  BLADE           439 66.7 221      307      571
##  BT_SCHOMBURG    660 66.7 221      528      792
## 
## Results are averaged over the levels of: rep 
## Confidence level used: 0.95

emmeans(m_block.as.random, ~gen)[1:10,]

##  gen          emmean   SE   df lower.CL upper.CL
##  (WWH*MM)*WR*    709 93.3 8.16      495      923
##  (WqKPWmH*3Ag    733 93.3 8.16      519      948
##  AMERY           616 93.3 8.16      401      830
##  ANGAS           576 93.3 8.16      362      791
##  AROONA          555 93.3 8.16      341      770
##  BATAVIA         534 93.3 8.16      319      748
##  BD231           639 93.3 8.16      424      853
##  BEULAH          535 93.3 8.16      321      750
##  BLADE           439 93.3 8.16      225      653
##  BT_SCHOMBURG    660 93.3 8.16      446      874
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

emmeans(m_block.as.fixed, ~gen, contr = list(c(1, -1, rep(0, n_distinct(dat_wheat_rcbd$gen)-2))))$contr

##  contrast                                                        estimate   SE  df t.ratio p.value
##  c(1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,     -24.3 94.4 221  -0.258  0.7968
## 
## Results are averaged over the levels of: rep

emmeans(m_block.as.random, ~gen, contr = list(c(1, -1, rep(0, n_distinct(dat_wheat_rcbd$gen)-2))))$contr

##  contrast                                                        estimate   SE  df t.ratio p.value
##  c(1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,     -24.3 94.4 221  -0.258  0.7968
## 
## Degrees-of-freedom method: kenward-roger

bind_rows(as.data.frame(emmeans(m_block.as.fixed, ~gen)[1:10,]) %>% transmute(gen, emmean, SE, blocks = "fixed"),
          as.data.frame(emmeans(m_block.as.random, ~gen)[1:10,]) %>% transmute(gen, emmean, SE, blocks = "random")) %>%
  ggplot(aes(gen, emmean))+
  geom_errorbar(aes(ymin = emmean - SE, ymax = emmean + SE, color = blocks), width = .1, position = position_dodge(.1))+
  geom_point()+
  scale_color_manual(values = c("tomato", "#1282A2"))+
  theme(axis.text = element_text(angle = 30))+
  labs(title = "Difference in Mean SE between fixed blocks and random blocks analyses")

bind_rows(as.data.frame(emmeans(m_block.as.fixed, ~gen, contr = list(c(1, -1, rep(0, n_distinct(dat_wheat_rcbd$gen)-2)), 
                                                                     c(1, 0, -1, rep(0, n_distinct(dat_wheat_rcbd$gen)-3))))$contr) %>% 
            transmute(estimate, SE, blocks = "fixed", contrast = c("t1 vs. t2", "t1 vs. t3")),
          as.data.frame(emmeans(m_block.as.random, ~gen, contr = list(c(1, -1, rep(0, n_distinct(dat_wheat_rcbd$gen)-2)), 
                                                                     c(1, 0, -1, rep(0, n_distinct(dat_wheat_rcbd$gen)-3))))$contr) %>% 
            transmute(estimate, SE, blocks = "random", contrast = c("t1 vs. t2", "t1 vs. t3"))) %>%
  ggplot(aes(contrast, estimate))+
  geom_errorbar(aes(ymin = estimate - SE, ymax = estimate + SE, color = blocks), width = 0.05, position = position_dodge(.05))+
  geom_point()+
  scale_color_manual(values = c("tomato", "#1282A2"))+
  theme(axis.text = element_text(angle = 30))+
  labs(title = "Difference in Contrast SE between fixed blocks and random blocks analyses")

Mean standard error, blocks as fixed: \(\sqrt{\frac{\sigma_{\varepsilon}^2 }{b}}\)

n_blocks <- n_distinct(dat_wheat_rcbd$rep)
sigma2_blocks.as.fixed <- sigma(m_block.as.fixed)^2

sqrt(sigma2_blocks.as.fixed / n_blocks)

## [1] 66.73095

Blocks as random: \(\hat\mu \pm t \cdot \sqrt{\frac{\sigma_{\varepsilon}^2 + \sigma^2_{b}}{b}}\)

sigma2_blocks.as.random <- sigma(m_block.as.random)^2
sigma2b_blocks.as.random <- as.data.frame(VarCorr(m_block.as.random))[1, 'sdcor']^2

sqrt((sigma2b_blocks.as.random + sigma2_blocks.as.random) / n_blocks)

## [1] 93.26549

23.4 Incomplete block designs

Same model as before: \(y_{ij} = \mu +\tau_i + b_j + \varepsilon_{ij}\), where \(y_{ij}\) is the observed response for the \(i\)th treatment in the \(j\)th block, \(\tau_i\) is the effect of the \(i\)th treatment (\(i = 1, 2, ..., t\)), \(b_j\) is the effect of the \(j\)th block, \(\varepsilon_{ij}\) is the residual of the \(i\)th treatment in the \(j\)th block.
In IBDs, \(k < t\), where \(k\) is the number of experimental units per block, and \(t\) is the number of treatments.
Better interblock information recovery with random \(b_j\).

23.4.1 Applied design and analysis of a Balanced IBD

The data below correspond to a study comparing soybean genotypes in a balanced incomplete block designed experiment.

Table 23.2: ANOVA table for the soybean BIBD
Source	df
Blocks	b-1 = 30
Genotype	g-1 = 30
Error	N-b-g+1 = 125
Total	N-1 = 185

dat_soy_ibd <- agridat::weiss.incblock

dat_soy_ibd %>% 
  ggplot(aes(col, row))+
  geom_tile(aes(fill = block))

m_block.as.fixed <- lm(yield ~ gen + block, data = dat_soy_ibd)

m_block.as.random <- lme4::lmer(yield ~ gen + (1|block), data = dat_soy_ibd)

emmeans(m_block.as.fixed, ~gen)[1:10,]

##  gen emmean    SE  df lower.CL upper.CL
##  G01   24.6 0.831 125     22.9     26.2
##  G02   26.9 0.831 125     25.3     28.6
##  G03   32.6 0.831 125     31.0     34.3
##  G04   27.0 0.831 125     25.3     28.6
##  G05   26.0 0.831 125     24.4     27.7
##  G06   32.0 0.831 125     30.3     33.6
##  G07   24.2 0.831 125     22.5     25.8
##  G08   27.6 0.831 125     26.0     29.3
##  G09   29.3 0.831 125     27.6     30.9
##  G10   24.4 0.831 125     22.8     26.1
## 
## Results are averaged over the levels of: block 
## Confidence level used: 0.95

emmeans(m_block.as.random, ~gen)[1:10,]

##  gen emmean    SE  df lower.CL upper.CL
##  G01   24.6 0.923 153     22.7     26.4
##  G02   27.0 0.923 153     25.2     28.8
##  G03   32.6 0.923 153     30.8     34.4
##  G04   26.9 0.923 153     25.1     28.8
##  G05   26.2 0.923 153     24.4     28.0
##  G06   32.0 0.923 153     30.1     33.8
##  G07   24.2 0.923 153     22.3     26.0
##  G08   27.6 0.923 153     25.8     29.4
##  G09   29.4 0.923 153     27.6     31.2
##  G10   24.5 0.923 153     22.7     26.4
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

emmeans(m_block.as.fixed, ~gen, contr = list(c(1, -1, rep(0, n_distinct(dat_soy_ibd$gen)-2))))$contr

##  contrast                                                        estimate   SE  df t.ratio p.value
##  c(1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,     -2.34 1.18 125  -1.982  0.0496
## 
## Results are averaged over the levels of: block

emmeans(m_block.as.random, ~gen, contr = list(c(1, -1, rep(0, n_distinct(dat_soy_ibd$gen)-2))))$contr

##  contrast                                                        estimate   SE  df t.ratio p.value
##  c(1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,      -2.4 1.17 129  -2.054  0.0420
## 
## Degrees-of-freedom method: kenward-roger