【R言語】母分散の検定・一標本カイ二乗検定

R言語で母分散の検定を行う関数やその実行例を紹介していきます。

この記事では、一標本カイ二乗検定（one-sample chi squared test）と呼ばれる母分散の検定をRで行う方法を解説します。

二標本の等分散性の検定であるF検定については以下の記事を参照。

: 【R言語】分散の比の検定・F検定　関数var.testの使い方
R言語でF検定を実行する方法を解説していきます。 2群間の等分散性の検定であるF検定を実行する関数とその実行例について見ていきます。 F検定の詳細については、下の記事で解説しています。また、この記事 ...
続きを見る

また、母分散の検定の詳細については以下の記事を参照。

: 【統計学】母分散の検定　一標本カイ二乗検定
パラメトリックな母分散の検定である一標本カイ二乗検定について解説する。母分散の検定の検定統計量や棄却域や尤度比に基づいた導出法についてもみていく。等分散性の検定であるF検定については次の記事を参照 ...
続きを見る

また、この記事で紹介する実行例は次のzipファイルに保存しています。

R言語　母分散の検定・一標本カイ二乗検定

1 ファイル 1.02 KB

ダウンロード

母分散の検定

母分散の検定の概要やその関数を以下にまとめます。

R言語での実行例が見たい方は、この下の実行例を参照してください。

母分散の検定

母分散の検定の概要は以下の通りです。

母分散の検定

正規分布$N(\mu, \sigma^2)$から大きさ$n$の無作為標本$x_1, x_2, \ldots, x_n$の標本平均と不偏標本分散をそれぞれ$\bar{x} =n^{-1} \sum_{i=1}^nx_i$、$s^2 = (n-1)^{-1}\sum_{i=1}^n(x_i - \bar{x})^2$とする。この標本に対応する確率変数をそれぞれ$X_1, X_2, \ldots, X_n$としたとき、次の仮説検定を考える。

\begin{align}&H_0:\ \sigma^2 = \sigma_0^2\\ &H_1:\ \sigma^2 \neq \sigma_0^2\end{align}

この仮説検定の検定統計量は次で与えられる。

\begin{align}V = \cfrac{(n - 1)S^2}{\sigma_0^2} \sim \chi_{n-1}^2,\end{align}

ここに$S^2$は次で定義される不偏標本分散の確率変数である。

\begin{align} S^2 &= \cfrac{1}{n-1}\sum_{i= 1}^n (X_i - \bar{X})^2,\\ \bar{X} &= \cfrac{1}{n} \sum_{i=1}^n X_i. \end{align}

また、有意水準$\alpha$の棄却域は次で与えられる。

\begin{align} [0, \chi_{n- 1, 1 - \alpha /2}) \cup (\chi_{n- 1, \alpha/2}), \end{align}

ここに、$\chi_{n, \alpha}$は自由度$n$のカイ二乗分布の上側$\alpha$点である。

1標本のt検定と同様に、帰無仮説で母分散の値を与える必要があります。2標本のF検定とは異なるので注意が必要です。

関数varTest

パッケージEnvStatsの関数varTestを用いることで母分散の検定を行うことができます。

以下varTestとその引数です。

varTest(x, alternative = "two.sided", conf.level = 0.95, sigma.squared = 1, data.name = NULL)

関数varTestの引数
引数	説明
x	numeric型のベクトル（観測値）。NAやNaN、infにも適用できるが除外される。
alternative	character型。"two.sided"で両側検定、"greater"で右片側検定、"less"左片側検定を指定できる。
conf.level	信頼区間の信頼度数。
sigma.squared	帰無仮説のときの母分散の値。
data.name	character型。検定に用いるデータの名前。

実行例

関数varTestをの実行例を以下に載せます。まず初めにパッケージEnvStatsをインストールしましょう。

install.packages("EnvStats")
library(EnvStats)

1 2	install.packages("EnvStats") library(EnvStats)

データセットとして次のirisを用います。irisのSpeciesがsetosaであるSepal.Lengthの分散について検定を行いたいと思います。

> dataset <- iris
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> dataset <- iris

> head(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

5 5.0 3.6 1.4 0.2 setosa

6 5.4 3.9 1.7 0.4 setosa

SpeciesがsetosaであるSepal.Lengthのヒストグラムと箱ひげ図は次の画像のようになります。ヒストグラムを作ったのは、データの母集団分布に正規性の仮定が必要であるためです。また、箱ひげ図を描くことで標本標準偏差がだいたい0.1から0.2なのが分かります。

library(ggplot2)
require(gridExtra)

boxPlot <- ggplot(dataset[dataset$Species == "setosa", ], aes(x = 1, y = Sepal.Length)) +
  geom_boxplot() +
  scale_y_continuous() +
  coord_flip() +
  geom_point(position = position_dodge(width = 0.9), alpha = 0.3, size = 1)

histPlot <- ggplot(dataset[dataset$Species == "setosa", ], aes(x = Sepal.Length)) +
  geom_histogram(bins = 15) +
  scale_y_continuous() 

grid.arrange(histPlot, boxPlot)

library(ggplot2)

require(gridExtra)

boxPlot <- ggplot(dataset[dataset$Species == "setosa", ], aes(x = 1, y = Sepal.Length)) +

geom_boxplot() +

scale_y_continuous() +

coord_flip() +

geom_point(position = position_dodge(width = 0.9), alpha = 0.3, size = 1)

histPlot <- ggplot(dataset[dataset$Species == "setosa", ], aes(x = Sepal.Length)) +

geom_histogram(bins = 15) +

scale_y_continuous()

grid.arrange(histPlot, boxPlot)

SpeciesがsetosaであるSepal.Lengthのヒストグラムと箱ひげ図

Sepal.Lengthの分散が経験的に0.1であると仮定し、「Sepal.Lengthの母分散が0.1であるか」という仮説検定を行いたいと思います。この検定を実行するには、関数varTestの引数sigma.squaredを0.1に指定します。testResultを参照すると、検定結果を見ることができます。

> sigma_squared0 <- 0.1
> testResult <- varTest(dataset[dataset$Species == "setosa", "Sepal.Length"], sigma.squared = sigma_squared0)
> testResult

	Chi-Squared Test on Variance

data:  dataset[dataset$Species == "setosa", "Sepal.Length"]
Chi-Squared = 60.882, df = 49, p-value = 0.2375
alternative hypothesis: true variance is not equal to 0.1
95 percent confidence interval:
 0.08669881 0.19293982
sample estimates:
variance 
0.124249

> sigma_squared0 <- 0.1

> testResult <- varTest(dataset[dataset$Species == "setosa", "Sepal.Length"], sigma.squared = sigma_squared0)

> testResult

Chi-Squared Test on Variance

data: dataset[dataset$Species == "setosa", "Sepal.Length"]

Chi-Squared = 60.882, df = 49, p-value = 0.2375

alternative hypothesis: true variance is not equal to 0.1

95 percent confidence interval:

0.08669881 0.19293982

sample estimates:

variance

0.124249

ポイント

「p-value = 0.2375」よりSepal.Lengthの分散は0.1であるという結論が得られました（分散は0.1ではないとは言えないことが言えた）。

この検定の結果を取得したいときは次のように、testResultの後ろに「$」を付ければ大丈夫です。検定統計量やp値、信頼区間を取得することができます。

stat <- testResult$statistic       #検定統計量
df <- testResult$parameters        #自由度
pValue <- testResult$p.value       #p値
estimate <- testResult$estimate    #推定値
nullValue <- testResult$null.value #帰無仮説の下での分散の値
ci <- testResult$conf.int          #信頼区間

stat <- testResult$statistic #検定統計量

df <- testResult$parameters #自由度

pValue <- testResult$p.value #p値

estimate <- testResult$estimate #推定値

nullValue <- testResult$null.value #帰無仮説の下での分散の値

ci <- testResult$conf.int #信頼区間

検定結果をcsvファイル等に保存したいときは、次のようにdata.frameに検定結果をベクトルとして保存し、write.csvでcsvファイルに出力すればおｋです。

resultTable <- data.frame(stat = stat, d.f. = df, p.value = pValue, estimate = estimate,
                          null.value = nullValue, c.i.lower = ci["LCL"], c.i.upper = ci["UCL"], row.names = NULL)

write.csv(resultTable, "分散の検定.csv")

resultTable <- data.frame(stat = stat, d.f. = df, p.value = pValue, estimate = estimate,

null.value = nullValue, c.i.lower = ci["LCL"], c.i.upper = ci["UCL"], row.names = NULL)

write.csv(resultTable, "分散の検定.csv")

母分散の検定結果

また、片側検定の例も紹介します。

分散2の正規分布から20個の乱数を発生させて、「母分散が1であるか、1より大きいか」という仮説検定を行ってみます。

> samples <- rnorm(20, sd = 2)
> samples
 [1] -0.3237746  0.8795209 -2.1397558 -1.4099974  3.4621502  0.5996884  1.7599754
 [8]  3.3630584  1.2360710 -3.2394897  2.6562894  1.6186240  1.2127874 -1.7915184
[15]  3.4022360  3.3201297 -1.4742185  2.7530723 -0.5298656  1.1003739

> samples <- rnorm(20, sd = 2)

> samples

[1] -0.3237746 0.8795209 -2.1397558 -1.4099974 3.4621502 0.5996884 1.7599754

[8] 3.3630584 1.2360710 -3.2394897 2.6562894 1.6186240 1.2127874 -1.7915184

[15] 3.4022360 3.3201297 -1.4742185 2.7530723 -0.5298656 1.1003739

右片側仮説検定を行うには、varTestの引数alternativeを"greater"に設定します。varTestを実行すると次の結果が得られます。

> sigma_squared0 <- 1
> testResult <- varTest(samples, sigma.squared = sigma_squared0, alternative = "greater")
> testResult

	Chi-Squared Test on Variance

data:  samples
Chi-Squared = 80.881, df = 19, p-value = 1.31e-09
alternative hypothesis: true variance is greater than 1
95 percent confidence interval:
 2.683197      Inf
sample estimates:
variance 
4.256896

> sigma_squared0 <- 1

> testResult <- varTest(samples, sigma.squared = sigma_squared0, alternative = "greater")

> testResult

Chi-Squared Test on Variance

data: samples

Chi-Squared = 80.881, df = 19, p-value = 1.31e-09

alternative hypothesis: true variance is greater than 1

95 percent confidence interval:

2.683197 Inf

sample estimates:

variance

4.256896

ポイント

「p-value = 1.31e-09」と書いてあるように母分散は1であるとは言えないことが分かりました。

また、この検定の結果の詳細は以下の通りです。

> stat <- testResult$statistic       #検定統計量
> df <- testResult$parameters        #自由度
> pValue <- testResult$p.value       #p値
> estimate <- testResult$estimate    #推定値
> nullValue <- testResult$null.value #帰無仮説の下での分散の値
> ci <- testResult$conf.int          #信頼区間
> resultTable <- data.frame(stat = stat, d.f. = df, p.value = pValue, estimate = estimate,
+                           null.value = nullValue, c.i.lower = ci["LCL"], c.i.upper = ci["UCL"], row.names = NULL)
> resultTable
      stat d.f.      p.value estimate null.value c.i.lower c.i.upper
1 80.88103   19 1.310265e-09 4.256896          1  2.683197       Inf

> stat <- testResult$statistic #検定統計量

> df <- testResult$parameters #自由度

> pValue <- testResult$p.value #p値

> estimate <- testResult$estimate #推定値

> nullValue <- testResult$null.value #帰無仮説の下での分散の値

> ci <- testResult$conf.int #信頼区間

> resultTable <- data.frame(stat = stat, d.f. = df, p.value = pValue, estimate = estimate,

+ null.value = nullValue, c.i.lower = ci["LCL"], c.i.upper = ci["UCL"], row.names = NULL)

> resultTable

stat d.f. p.value estimate null.value c.i.lower c.i.upper

1 80.88103 19 1.310265e-09 4.256896 1 2.683197 Inf

まとめ

R言語で母分散の検定を実行するパッケージや関数を紹介しました。

二標本の母分散の検定であるF検定はRのbaseに標準でインストールされていますが、一標本のカイ二乗検定はパッケージをインストールする必要があります。

パッケージEnvStatsのvarTestで母分散の検定が可能です。

usagi-san

統計学とゲームとかをメインに解説していくよ。数式とかプログラミングコードにミスがあったり質問があったりする場合はコメントで受け付けます。すぐに対応します。

2024/04/28

【R言語】F検定テンプレートスクリプト

2024/04/28

【R言語】経験密度関数・経験分布関数のプロット　関数densityとecdfの使い方

2024/04/10

【R言語】ベータ関数とガンマ関数　関数beta, gammaの使い方

usagi-sanの記事をもっと見る

-R言語
-R言語, プログラミング, 統計学, 統計解析

comment コメントをキャンセル

: R言語
【R言語】関数lmによる線形回帰　単回帰分析・重回帰分析　回帰分析で予測する
R言語の回帰分析の実行方法を紹介していきます。回帰分析の実行方法だけでなく、回帰分析により作ったモデルでテストデータの予測も行っていきます。実行方法では、回帰分析の結果の見方やその抽出方法を詳しく解 ...

: R言語
【R言語】母比率の検定・母比率の差の検定
R言語で母比率の検定を実行する方法を紹介します。 R言語ではt検定などと同様に、簡単に比率に関する検定を実行することが可能です。今回は、母比率の検定に関する関数の説明やその実行例をまとめました。こ ...

: R言語
【R言語】コクラン=マンテル=ヘンツェル検定
R言語でコクラン=マンテル=ヘンツェル検定を実行する関数やその実行例を紹介します。各層のオッズ比が1であるかの検定であるコクラン=マンテル=ヘンツェル検定をRで行っていきます。また、カイ二乗検定で ...

: R言語
R言語　データ整形　データセットの分割・併合【初心者向け】
こんにちは、usagi-sanです。今回は教育用標準データセットSSDSE-2020B.csvを用いて、複数のデータセットを用いたデータ整形の練習をしていきます。データセットを複数に分割したり、分割 ...

: R言語
【R言語】条件分岐　if文・switch文
R言語の条件分岐に用いる文について解説していきます。この記事では条件分岐に用いるif文の使い方や、多条件分岐で用いられるswitch文の使い方について見ていきます。ここで扱うプログラミングコードは ...

【計算フォーム】指数分布の分布関数　下側確率の計算

【統計学】自由度が無限大のときのt分布　正規分布への収束

【R言語】母分散の検定・一標本カイ二乗検定

【R言語】分散の比の検定・F検定 関数var.testの使い方

【統計学】母分散の検定 一標本カイ二乗検定

母分散の検定

母分散の検定

関数varTest

実行例

まとめ

【R言語】分散の比の検定・F検定　関数var.testの使い方

【統計学】母分散の検定　一標本カイ二乗検定