【R言語】コクラン=マンテル=ヘンツェル検定

R言語でコクラン=マンテル=ヘンツェル検定を実行する関数やその実行例を紹介します。

各層のオッズ比が1であるかの検定であるコクラン=マンテル=ヘンツェル検定をRで行っていきます。

また、カイ二乗検定ではなくコクラン=マンテル=ヘンツェル検定を行ったほうがよいケースについてみていきます。シンプソンのパラドックスが発生するようなデータセットに対してどのようにアプローチするか解説します。

今回扱うRのコードは次のzipファイルに保存しています。

R言語　コクラン=マンテル=ヘンツェル検定

1 ファイル 1.89 KB

ダウンロード

カイ二乗検定については以下の記事を参照。

: 【R言語】カイ二乗検定　適合度検定・独立性の検定
R言語でカイ2乗検定を実行する方法を解説していきます。データセットの用意から検定までの一連の流れを紹介していきます。カイ2乗検定の種類として適合度検定や独立性の検定がありますが、R言語に標準で実装さ ...
続きを見る

コクラン=マンテル=ヘンツェル検定

コクラン=マンテル=ヘンツェル検定は2×2の多層からなる分割表のオッズ比がすべて1である、すなわち、分割表のどちらかの属性を群、もう一方を二値からなる事象とすると、各層で事象の起こりやすさ（確率）が2群間で同じであるという仮説を検定するのがコクラン=マンテル=ヘンツェル検定です。層が1つであるとき独立性の検定と一致します。

以下、コクラン=マンテル=ヘンツェル検定の概要です。

コクラン=マンテル=ヘンツェル検定

2つの属性$A$、$B$が与えられているとする。$A$と$B$はそれぞれ2個のクラス$A_1, A_2$、$B_1, B_2$をもつ。このとき、$n$個の標本を次のように$k$層の2×2分割表に分類する

\begin{array}{c|ccc} & B_1 & B_2 & 計\\\hline A_1 & x_{k11} & x_{k12} & n_{k\cdot} \\ A_2 & x_{k21} & x_{k22} & n_{k2\cdot} \\ 計 & n_{k\cdot1} & n_{k\cdot 2} & n_k \end{array}

また、それぞれ生起数の確率変数を$X_{k11}, X_{k12}, X_{k21}, X_{k22}$とする。このとき、次の仮説を考える。

\begin{align} &H0:\ OR_1 = OR_2 = \cdots = OR_k= 1\\ & H_1:\ \mathrm{not}\ H_0\end{align}

検定統計量として統計量を用いる。

\begin{align} X^2 = \cfrac{\left[\sum_{i=1}^k (X_{i11} - \frac{n_{i1\cdot}n_{i\cdot1}}{n_i}) \right]^2}{\sum_{i=1}^k \frac{n_{i1\cdot}n_{i2\cdot}n_{i\cdot1}n_{i\cdot2}}{n_i^2 (n_i-1)}}\sim \chi_1^2.\end{align}

また、有意水準$\alpha$の棄却域は次で与えられる。

\begin{align}[0, \chi_{1, 1- \alpha/2}^2) \cup (\chi_{1, \alpha/2}^2, \infty),\end{align}

ここに、$\chi_{1, \alpha}^2$は自由度$1$のカイ二乗分布の上側$\alpha$点である。

関数mantelhaen.test

関数mantelhaen.testを用いることでコクラン=マンテル=ヘンツェル検定を実行することができます。以下、関数とその引数です。

mantelhaen.test(x, y = NULL, z = NULL,
alternative = c("two.sided", "less", "greater"),
correct = TRUE, exact = FALSE, conf.level = 0.95)

関数mantelhaen.testの引数
x	3元分割表。すべての変数のカテゴリーの数は少なくとも2以上であり、3番目の変数はstrataに対応する必要がある。
y	2つ以上のlevelから成るfactor型のオブジェクト。xがtableのとき無視される。
z	strataに対応する2つ以上のlevelから成るfactor型のオブジェクト。xがtableのとき無視される。
alternative	character型。"two.sided"で両側検定、"greater"で右片側検定、"less"で左片側検定を指定できる。2×2×Kの3元分分割表に対して適用される。
correct	logical型。TRUEのとき、検定統計量を計算する際に連続性の補正を適用する。2×2×Kの3元分分割表に対して適用される。
exact	logical型。FALSEのときカイ二乗分布を用いた通常のコクラン=マンテル=ヘンツェル検定、TRUEのとき正確率検定を実行する。2×2×Kの3元分分割表に対して適用される。
conf.level	信頼区間を計算する際の信頼度数。2×2×Kの3元分分割表に対して適用される。

実行例

関数mantelhaen.testの実行例を紹介します。データセットとして次のTitanicを用います。

dataset <- Titanic

1	dataset <- Titanic

Titanicは次のように4元分割表となっており、大括弧の各要素に次元名（dimname）を指定することで各分割表を取得することができます。

下の例では、それぞれAge="Adult"かつSurvived="No"とAge="Child"かつSurvived="No"の分割表です。

> dataset[, , "Adult", "No"]
      Sex
Class  Male Female
  1st   118      4
  2nd   154     13
  3rd   387     89
  Crew  670      3
> dataset[, , "Child", "No"]
      Sex
Class  Male Female
  1st     0      0
  2nd     0      0
  3rd    35     17
  Crew    0      0

> dataset[, , "Adult", "No"]

Sex

Class Male Female

1st 118 4

2nd 154 13

3rd 387 89

Crew 670 3

> dataset[, , "Child", "No"]

Sex

Class Male Female

1st 0 0

2nd 0 0

3rd 35 17

Crew 0 0

コクラン=マンテル=ヘンツェル検定によって、各ClassについてSexとSurvivedのオッズ比が1であるかどうか、すなわちSexとSurvivedに関連性があるかどうかの仮説検定を行いたいと思います。

この検定を行うには、まず4元分割表であるdatasetをSex、Survived、Classからなる3元分割表にする必要があります。関数margin.tableを用いることで指定した次元名から成るクロス集計表を作ることができます。

> marginTable <- margin.table(dataset, c("Sex", "Survived", "Class"))　#3元分割表:1元Sex, 2元Survived, 3元Class
> marginTable[, , "1st"]
        Survived
Sex       No Yes
  Male   118  62
  Female   4 141
> marginTable[, , "2nd"]
        Survived
Sex       No Yes
  Male   154  25
  Female  13  93

> marginTable <- margin.table(dataset, c("Sex", "Survived", "Class"))　#3元分割表:1元Sex, 2元Survived, 3元Class

> marginTable[, , "1st"]

Survived

Sex No Yes

Male 118 62

Female 4 141

> marginTable[, , "2nd"]

Survived

Sex No Yes

Male 154 25

Female 13 93

3元分割表が作成出来たら、関数mantelhaen.testでコクラン=マンテル=ヘンツェル検定が実行できます。

> testResult <- mantelhaen.test(marginTable)
> testResult

	Mantel-Haenszel chi-squared test with continuity correction

data:  marginTable
Mantel-Haenszel X-squared = 360.33, df = 1, p-value < 2.2e-16
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
  8.232629 14.185153
sample estimates:
common odds ratio 
         10.80653

> testResult <- mantelhaen.test(marginTable)

> testResult

Mantel-Haenszel chi-squared test with continuity correction

data: marginTable

Mantel-Haenszel X-squared = 360.33, df = 1, p-value < 2.2e-16

alternative hypothesis: true common odds ratio is not equal to 1

95 percent confidence interval:

8.232629 14.185153

sample estimates:

common odds ratio

10.80653

ポイント

「p-value < 2.2e-16」と書いてあることから、各ClassについてSexとSurvivedのオッズ比は1であるとはいえないことが分かりました。SexとSurvivedには関連性がないとはいえない、つまり関連性があることが言えました。

上記の検定の結果を抽出するには、他の検定と同様にtestResultの後ろに$を付ければおｋです。

stat <- testResult$statistic       #検定統計量
df <- testResult$parameter         #自由度
pValue <- testResult$p.value       #p値
ci <- testResult$conf.int          #オッズ比の信頼区間
estimate <- testResult$estimate    #オッズ比の推定値
nullValue <- testResult$null.value #帰無仮説の下でのオッズ比の値

stat <- testResult$statistic #検定統計量

df <- testResult$parameter #自由度

pValue <- testResult$p.value #p値

ci <- testResult$conf.int #オッズ比の信頼区間

estimate <- testResult$estimate #オッズ比の推定値

nullValue <- testResult$null.value #帰無仮説の下でのオッズ比の値

検定結果をcsvファイルや他のファイルに出力したい場合は、次のように関数write.csv（またはwrite.table）を用います。

resultTable <- data.frame(stat = stat, d.f. = df, p.value = pValue, 
                          c.i.lower = ci[1], c.i.upper = ci[2],
                          estimate = estimate, null.value = nullValue, row.names = NULL)
write.csv(resultTable, "コクラン=マンテル=ヘンツェル検定.csv", row.names = FALSE)

resultTable <- data.frame(stat = stat, d.f. = df, p.value = pValue,

c.i.lower = ci[1], c.i.upper = ci[2],

estimate = estimate, null.value = nullValue, row.names = NULL)

write.csv(resultTable, "コクラン=マンテル=ヘンツェル検定.csv", row.names = FALSE)

上記を実行することで作業ディレクトリに次の画像のようなcsvファイルを作成することができます。

コクラン=マンテル=ヘンツェル検定結果

シンプソンのパラドックス

余談ですが、なぜカイ二乗検定（独立性の検定）の代わりにとコクラン=マンテル=ヘンツェル検定を用いたのかみていきます。

ここでは、プロットのために次のパッケージを使います。

library(questionr)
library(ggplot2)

1 2	library(questionr) library(ggplot2)

また、データセットに次のUCBAdmissionsを使うことにします。

> dataset <- UCBAdmissions
> dataset[, , Dept = "A"]
          Gender
Admit      Male Female
  Admitted  512     89
  Rejected  313     19
> dataset[, , Dept = "B"]
          Gender
Admit      Male Female
  Admitted  353     17
  Rejected  207      8

> dataset <- UCBAdmissions

> dataset[, , Dept = "A"]

Gender

Admit Male Female

Admitted 512 89

Rejected 313 19

> dataset[, , Dept = "B"]

Gender

Admit Male Female

Admitted 353 17

Rejected 207 8

UCBAdmissionsはGender（性別）、Dept（学部）別のAdmit（合格）の3元分割表のデータセットです。

GenderとAdmitは独立であるかどうかの検定を行いたいと思います。すなわちGender別でAdmitnの確率に差があるかどうかの検定を行います。Dept別ではないカイ二乗検定（独立性の検定）と各Deptの層を考慮したコクラン=マンテル=ヘンツェル検定を次のように実行します。

threeWayTable <- dataset
twoWayTable <- margin.table(dataset, c("Admit", "Gender"))

mantelhaenResult <- mantelhaen.test(threeWayTable)
chisqResult <- chisq.test(twoWayTable)

threeWayTable <- dataset

twoWayTable <- margin.table(dataset, c("Admit", "Gender"))

mantelhaenResult <- mantelhaen.test(threeWayTable)

chisqResult <- chisq.test(twoWayTable)

mantelhaenResultにはコクラン=マンテル=ヘンツェル検定、chisqResultにはカイ二乗検定の結果が入っています。chisqResultにはオッズ比の推定値が入っていないため、以下のようにして検定結果にオッズ比を追加します。

oddsRatio <- (twoWayTable[1, 1] / twoWayTable[1, 2]) / (twoWayTable[2, 1] / twoWayTable[2, 2])
names(oddsRatio) = "odds ratio"
chisqResult$estimate <- oddsRatio

oddsRatio <- (twoWayTable[1, 1] / twoWayTable[1, 2]) / (twoWayTable[2, 1] / twoWayTable[2, 2])

names(oddsRatio) = "odds ratio"

chisqResult$estimate <- oddsRatio

コクラン=マンテル=ヘンツェル検定の共通オッズ比とカイ二乗検定の際のオッズ比はそれぞれ次のようになります。

共通オッズ比の方は1に近いのに対して、Deptを考慮しないときのオッズ比は2に近いことが分かります。また、それぞれの検定結果も以下のように、異なる結論を出すことが確認できます。

> mantelhaenResult

	Mantel-Haenszel chi-squared test with continuity correction

data:  threeWayTable
Mantel-Haenszel X-squared = 1.4269, df = 1, p-value = 0.2323
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.7719074 1.0603298
sample estimates:
common odds ratio 
        0.9046968 

> chisqResult

	Pearson's Chi-squared test with Yates' continuity correction

data:  twoWayTable
X-squared = 91.61, df = 1, p-value < 2.2e-16
sample estimates:
odds ratio 
   1.84108

> mantelhaenResult

Mantel-Haenszel chi-squared test with continuity correction

data: threeWayTable

Mantel-Haenszel X-squared = 1.4269, df = 1, p-value = 0.2323

alternative hypothesis: true common odds ratio is not equal to 1

95 percent confidence interval:

0.7719074 1.0603298

sample estimates:

common odds ratio

0.9046968

> chisqResult

Pearson's Chi-squared test with Yates' continuity correction

data: twoWayTable

X-squared = 91.61, df = 1, p-value < 2.2e-16

sample estimates:

odds ratio

1.84108

コクラン=マンテル=ヘンツェル検定では帰無仮説を棄却されないのに対し、カイ二乗検定では棄却しています。

各Deptごとのオッズ比を計算し、フォレストプロットでDeptを考慮したときとしないときのオッズ比を比較してみます。

dept <- NULL
admit <- data.frame(matrix(NA, ncol = dim(threeWayTable)[1]))[0, ]
odds <- NULL
lower <- NULL
upper <- NULL
for(i in seq_len(length(dimnames(threeWayTable)$Dept))) {
  ithTable <- threeWayTable[, , i]
  oddsRatio <- odds.ratio(ithTable)
  admit <- rbind(admit, rowSums(ithTable))
  odds <- append(odds, oddsRatio$OR)
  lower <- append(lower, oddsRatio$`2.5 %`)
  upper <- append(upper, oddsRatio $`97.5 %`)
}
oddsRatio <- odds.ratio(twoWayTable)
admit <- rbind(admit, rowSums(twoWayTable))
odds <- append(odds, oddsRatio$OR)
lower <- append(lower, oddsRatio$`2.5 %`)
upper <- append(upper, oddsRatio $`97.5 %`)
OR <- data.frame(odds.ratio = odds, lower = lower, upper = upper)
row.names(OR) <- c(dimnames(threeWayTable)$Dept, "Mariginal")

#フォレストプロット
ggplot(OR, aes(odds.ratio, row.names(OR))) +
  geom_point() +
  geom_vline(xintercept = 1, linetype = "dashed") +
  theme(panel.background = element_rect(fill = "white", color = "grey"))+
  labs(title = "Forestplot", x = "odds ratio", y = "") +
  geom_errorbar(aes(xmin = lower, xmax = upper), width = 0.5) +
  coord_fixed(ratio = 1 / 2 ) +
  scale_x_continuous(limits = c(0, ceiling(max(OR$upper)))) +
  scale_y_discrete(limits = rev(row.names(OR)))

dept <- NULL

admit <- data.frame(matrix(NA, ncol = dim(threeWayTable)[1]))[0, ]

odds <- NULL

lower <- NULL

upper <- NULL

for(i in seq_len(length(dimnames(threeWayTable)$Dept))) {

ithTable <- threeWayTable[, , i]

oddsRatio <- odds.ratio(ithTable)

admit <- rbind(admit, rowSums(ithTable))

odds <- append(odds, oddsRatio$OR)

lower <- append(lower, oddsRatio$`2.5 %`)

upper <- append(upper, oddsRatio $`97.5 %`)

}

oddsRatio <- odds.ratio(twoWayTable)

admit <- rbind(admit, rowSums(twoWayTable))

odds <- append(odds, oddsRatio$OR)

lower <- append(lower, oddsRatio$`2.5 %`)

upper <- append(upper, oddsRatio $`97.5 %`)

OR <- data.frame(odds.ratio = odds, lower = lower, upper = upper)

row.names(OR) <- c(dimnames(threeWayTable)$Dept, "Mariginal")

#フォレストプロット

ggplot(OR, aes(odds.ratio, row.names(OR))) +

geom_point() +

geom_vline(xintercept = 1, linetype = "dashed") +

theme(panel.background = element_rect(fill = "white", color = "grey"))+

labs(title = "Forestplot", x = "odds ratio", y = "") +

geom_errorbar(aes(xmin = lower, xmax = upper), width = 0.5) +

coord_fixed(ratio = 1 / 2 ) +

scale_x_continuous(limits = c(0, ceiling(max(OR$upper)))) +

scale_y_discrete(limits = rev(row.names(OR)))

上記を実行すると次の各Deptごとのオッズ比のフォレストプロットが出力されます。

フォレストプロット

図の一番下のMarginalはDeptに関して周辺和をとったときののオッズ比です。

A以外のDeptのオッズ比は1に近いため、Deptごとで見た時だとGenderでAdmitに差があるとはいいがたいということがプロットから見て取れます。

注意ポイント

このように、層別でみたときとみないときとで全く異なる検定結果を出してしまうことがあるため、注意が必要です。

ブレスローデイの検定

また、任意のオッズ比へ拡張したブレスローデイの検定も以下にまとめます。

オッズ比が1ではなく任意で与えることが可能であり、コクラン=マンテル=ヘンツェル検定の一般化となっています。

ブレスローデイ検定

\begin{array}{c|ccc} & B_1 & B_2 & 計\\\hline A_1 & x_{k11} & x_{k12} & n_{k\cdot} \\ A_2 & x_{k21} & x_{k22} & n_{k2\cdot} \\ 計 & n_{k\cdot1} & n_{k\cdot 2} & n_k \end{array}

また、それぞれ生起数の確率変数を$X_{k11}, X_{k12}, X_{k21}, X_{k22}$とする。このとき、次の仮説を考える。

\begin{align} &H0:\ OR_1 = OR_2 = \cdots =OR_k = OR_0\\ & H_1:\ \mathrm{not}\ H_0\end{align}

検定統計量として統計量を用いる。

\begin{align} X^2 = \sum_{i=1}^k \cfrac{\left[X_{i11} - \mathrm{E}[X_{i11}]\right]^2}{ \hat{\mathrm{V}}[X_{i11}]}\sim \chi_1^2,\end{align}

ここに$\hat{\mathrm{V}}[X_{i11}]$は$X_{i11}$の分散の推定量。また、有意水準$\alpha$の棄却域は次で与えられる。

\begin{align}[0, \chi_{1, 1- \alpha/2}^2) \cup (\chi_{1, \alpha/2}^2, \infty),\end{align}

ここに、$\chi_{1, \alpha}^2$は自由度$1$のカイ二乗分布の上側$\alpha$点である。

関数BreslowDayTest

関数BreslowDayTestはパッケージDescToolsに実装されており、ブレスローデイの検定を実行することができます。以下、関数とその引数となります。

BreslowDayTest(x, OR = NA, correct = FALSE)

関数BreslowDayTestの引数
x	3元分割表。
OR	帰無仮説のときのオッズ比。
correct	logical型。TRUEのときTaroneの修正が適用される。

実行例

まず、パッケージDescToolsをインストールしましょう。

install.packages("DescTools")
library(DescTools)ぱ

1 2	install.packages("DescTools") library(DescTools)ぱ

インストールしたら、関数BreslowDayTestが使えるようになります。

OR0 <- 0.5
testResult <- BreslowDayTest(marginTable, OR = OR0)

OR0 <- 1 #帰無仮説の下でのオッズ比の値を1とするとmantelhaen.testと一致
BreslowDayTest(partial_tables, OR = OR0)

OR0 <- 0.5

testResult <- BreslowDayTest(marginTable, OR = OR0)

OR0 <- 1 #帰無仮説の下でのオッズ比の値を1とするとmantelhaen.testと一致

BreslowDayTest(partial_tables, OR = OR0)

1行目は各層のオッズ比が0.5であるかどうかの仮説検定を行っており、3行目は1であるかどうかの検定を行っています。

2つ目の結果はmantelhaen.testを用いた時と同じになります。

ポイント

testResultを参照すると、「p-value < 2.2e-16」であり、各層のオッズ比は0.5であるとは言えないことが分かりました。

> testResult

	Breslow-Day test on Homogeneity of Odds Ratios

data:  marginTable
X-squared = 714.96, df = 3, p-value < 2.2e-16

> testResult

Breslow-Day test on Homogeneity of Odds Ratios

data: marginTable

X-squared = 714.96, df = 3, p-value < 2.2e-16

また、mantelhaen.testと同様に、検定結果の詳細を取得することができます。

stat <- testResult$statistic       #検定統計量
df <- testResult$parameter         #自由度
pValue <- testResult$p.value       #p値

resultTable <- data.frame(stat = stat, d.f. = df, p.value = pValue)
write.csv(resultTable, "ブレスローデイ検定.csv")