Wald Wolfowitz Runs Test

A single variable indepence test.

In a nutshell

The Wald-Wolfowitz Runs Test checks the randomness of a sequence of elements where those elements can take one of two mutually exclusive values. It is a non-parametric test.

Applications

Test the randomness of a distribution: \ Given a sequence of values from a distribution, each value can be labelled as being above the median '+', below the median '-', or omitted if equal to the median. While keeping the labels in the same order in which their corresponding values were sampled, the labels can be used as input to the Runs Test to check for randomness of the original distribution.

Sources: NIST/SEMATECH e-Handbook of Statistical Methods

\ Test for the independence of two groups: \ Taking one sample each from two groups. If the values can be ordered in some meaningful way (e.g. lowest to highest), we can then check whether these groups represent different populations. This is done by labelling each value according to the group from which it was taken, and then ordering the data in some meaningful way. The labels, once ordered according to their attached values, are then inspected for 'runs'.

Sources: Complete Dissertation

\ Do specific data and a function fit? \ Label the data that exceeds the function value with $+$ and the other data with $-$. Take as input to the function some independent variable about the data.

Problems:

Sources: Wikipedia | NIST/SEMATECH e-Handbook of Statistical Methods

Example

Question: Based on a single variable that we've measured, does group 1 come from a different population to group 2?

Two groups of people, X and Y. For each group we have taken measurements of height.

Test steps:

  1. Collect the data
  2. Combine measurements from each group, but label each measurement to keep track of which group it came from.
  3. Sort the data.
  4. Along the sorted data, count the 'runs' of consecutive measurements with the same group label.

    For example, the following sequence has 5 such runs:

$$\begin{matrix} sequence\ \ \ \ \ \ : & y, & x, & x, & x, & y, & y, & x, & y \\ run\ number: & 1 & 2 & - & - & 3 & - & 4 & 5 \end{matrix} $$
  1. Using a reference distribution as reference, compare the number of runs $R$ to the lower and upper critical values corresponding to the desired level of signifincance $\alpha$.

    • For 'large' samples ($n_1>10$ and $n_2>10$) compare to standard normal

    • Otherwise, use tables available elsewhere (Mendenhall, 1982)

      In the following example we'll use a 'small' sample size but still use a normal distribution as reference for simplicity's sake.

1. Collect the data

$$A=\{\begin{matrix}166.0 & 161.0 & 161.8 & 147.4 & 177.4 & 172.8 & 162.4 \\ \end{matrix}\} $$$$B=\{\begin{matrix}184.3 & 180.5 & 174.2 & 185.8 & 176.0 & 175.9 & 172.4 \\ \end{matrix}\} $$

2 Label

$$\begin{matrix}A & A & A & A & A & A & A \\ 166.0 & 161.0 & 161.8 & 147.4 & 177.4 & 172.8 & 162.4 \\ \end{matrix} $$$$\begin{matrix}B & B & B & B & B & B & B \\ 184.3 & 180.5 & 174.2 & 185.8 & 176.0 & 175.9 & 172.4 \\ \end{matrix} $$

3 Combine and Sort

$$\begin{matrix}A & A & A & A & A & B & A & B & B & B & A & B & B & B \\ 147.4 & 161.0 & 161.8 & 162.4 & 166.0 & 172.4 & 172.8 & 174.2 & 175.9 & 176.0 & 177.4 & 180.5 & 184.3 & 185.8 \\ \end{matrix} $$

4. Count

5. Test

Addendum: Runs Test algorithm where $F(x) = G(x)$

Let's do the same thing but with samples taken from the same population

...and from different populations again just to check we get the same results as from the start of the notebook