ARL by Monte Carlo simulation • shewhartr

library(shewhartr)
library(ggplot2)
library(dplyr)

The Average Run Length (ARL) of a control chart is the expected number of samples until the chart signals. It is the single most important operating characteristic — it tells you, on average, how quickly the chart catches a real shift, and how often it cries wolf.

Two flavours:

In-control ARL ( $\mathrm{ARL}_0$ ): the expected number of in-control samples until a false alarm. Larger is better. A high $\mathrm{ARL}_0$ means low false-alarm rate.
Out-of-control ARL ( $\mathrm{ARL}_1$ ): the expected number of samples after a real shift before the chart signals. Smaller is better. A low $\mathrm{ARL}_1$ means fast detection.

These two are in tension. Adding more rules to a chart sharpens detection (lower $\mathrm{ARL}_1$ ) but increases false alarms (lower $\mathrm{ARL}_0$ ). shewhart_arl() quantifies the trade-off.

Closed-form benchmarks

For the simplest setup — Nelson 1 only, normal data, 3-sigma limits — the ARL is known in closed form. The probability of falling outside 3-sigma is $2 \cdot \Phi(-3) \approx 0.0027$ , so:

$\mathrm{ARL}_0 = \frac{1}{0.0027} \approx 370.4.$

Let’s verify by simulation:

set.seed(2025)
shewhart_arl(
  shift = 0,
  rules = "nelson_1_beyond_3s",
  n_sim = 5000,
  max_run = 2000
)

The estimate should be close to 370 (Monte Carlo error around the closed form).

Adding rules sharpens detection

What happens if we add Nelson 2 (9 points same side) on top?

set.seed(2025)
shewhart_arl(
  shift = 0,
  rules = c("nelson_1_beyond_3s", "nelson_2_nine_same"),
  n_sim = 5000
)

The ARL_0 drops from ~370 to ~250, meaning false alarms become more common. Whether that’s worth it depends on what we get in detection power. Let’s plot ARL across a grid of shifts:

shifts <- seq(0, 3, by = 0.25)

set.seed(2025)
arl_n1 <- shewhart_arl(shifts, "nelson_1_beyond_3s", n_sim = 2000)
arl_n12 <- shewhart_arl(shifts,
                        c("nelson_1_beyond_3s", "nelson_2_nine_same"),
                        n_sim = 2000)

bind_rows(
  arl_n1  |> mutate(rules = "Nelson 1 only"),
  arl_n12 |> mutate(rules = "Nelson 1 + 2")
) |>
  ggplot(aes(shift, arl, colour = rules)) +
  geom_line(linewidth = 0.7) +
  geom_ribbon(aes(ymin = arl_lower, ymax = arl_upper, fill = rules),
              alpha = 0.12, colour = NA) +
  scale_y_log10() +
  scale_colour_manual(
    values = c(`Nelson 1 only` = unname(shewhart_palette("signal")["in_control"]),
               `Nelson 1 + 2`  = unname(shewhart_palette("family")["memory_based"]))) +
  scale_fill_manual(
    values = c(`Nelson 1 only` = unname(shewhart_palette("signal")["in_control"]),
               `Nelson 1 + 2`  = unname(shewhart_palette("family")["memory_based"]))) +
  labs(x = "Shift (sigma)", y = "ARL (log scale)",
       title = "Operating characteristics") +
  shewhart_theme()

The gain is largest for small shifts (around 0.5-1 sigma), where Nelson 1 alone takes a long time to fire. For shifts of 2 sigma or more, both rule sets detect within a couple of samples.

A quantitative comparison: WE-7 vs Nelson 2

The original Shewhart package (now shewhartr) (v0.1.x) used a Western Electric “7 in a row” rule for phase detection. The default in v1.0 is Nelson 2 (9 in a row). The trade-off:

set.seed(2025)
arl_we  <- shewhart_arl(0, "we_seven_same",       n_sim = 5000, max_run = 2000)
arl_n2  <- shewhart_arl(0, "nelson_2_nine_same",  n_sim = 5000, max_run = 2000)
arl_we
arl_n2

The WE rule’s ARL_0 is around 64 (a false phase change every 64 in-control samples on average); Nelson 2’s is around 256 (one every ~256 samples). For most monitoring contexts that is a meaningful difference. The WE rule is easier to teach, but the false-alarm cost is high.

shewhart_regression() accepts either via the phase_rule argument:

shewhart_regression(temperature_drift, value = temp_c, index = minute,
                    phase_rule = "we_seven_same")        # legacy default
shewhart_regression(temperature_drift, value = temp_c, index = minute,
                    phase_rule = "nelson_2_nine_same")    # new default

Beyond normal residuals

shewhart_arl() simulates from a normal distribution by default. Real residuals are often non-normal — heavy-tailed, skewed, or autocorrelated. The closed-form ARL_0 of 370 for Nelson 1 then overstates how rarely false alarms occur. A bootstrap variant (simulating from your actual residuals) is on the roadmap; until then, a quick check is to call shewhart_diagnostics(fit) and inspect the Q-Q and ACF panels.

References

Champ, C. W., & Woodall, W. H. (1987). Exact Results for Shewhart Control Charts with Supplementary Runs Rules. Technometrics, 29(4), 393-399.
Wald, A. (1947). Sequential Analysis. Wiley.
Page, E. S. (1954). Continuous Inspection Schemes. Biometrika, 41(1-2), 100-115.
Lucas, J. M., & Saccucci, M. S. (1990). Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements. Technometrics, 32(1), 1-12.