CC BY-NC-ND 3.0
state
## Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama 3615 3624 2.1 69.05 15.1 41.3 20
## Alaska 365 6315 1.5 69.31 11.3 66.7 152
## Arizona 2212 4530 1.8 70.55 7.8 58.1 15
## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65
## California 21198 5114 1.1 71.71 10.3 62.6 20
## Colorado 2541 4884 0.7 72.06 6.8 63.9 166
## Connecticut 3100 5348 1.1 72.48 3.1 56.0 139
## Delaware 579 4809 0.9 70.06 6.2 54.6 103
## Florida 8277 4815 1.3 70.66 10.7 52.6 11
## Georgia 4931 4091 2.0 68.54 13.9 40.6 60
## Hawaii 868 4963 1.9 73.60 6.2 61.9 0
## Idaho 813 4119 0.6 71.87 5.3 59.5 126
## Illinois 11197 5107 0.9 70.14 10.3 52.6 127
## Indiana 5313 4458 0.7 70.88 7.1 52.9 122
## Iowa 2861 4628 0.5 72.56 2.3 59.0 140
## Kansas 2280 4669 0.6 72.58 4.5 59.9 114
## Kentucky 3387 3712 1.6 70.10 10.6 38.5 95
## Louisiana 3806 3545 2.8 68.76 13.2 42.2 12
## Maine 1058 3694 0.7 70.39 2.7 54.7 161
## Maryland 4122 5299 0.9 70.22 8.5 52.3 101
## Massachusetts 5814 4755 1.1 71.83 3.3 58.5 103
## Michigan 9111 4751 0.9 70.63 11.1 52.8 125
## Minnesota 3921 4675 0.6 72.96 2.3 57.6 160
## Mississippi 2341 3098 2.4 68.09 12.5 41.0 50
## Missouri 4767 4254 0.8 70.69 9.3 48.8 108
## Montana 746 4347 0.6 70.56 5.0 59.2 155
## Nebraska 1544 4508 0.6 72.60 2.9 59.3 139
## Nevada 590 5149 0.5 69.03 11.5 65.2 188
## New Hampshire 812 4281 0.7 71.23 3.3 57.6 174
## New Jersey 7333 5237 1.1 70.93 5.2 52.5 115
## New Mexico 1144 3601 2.2 70.32 9.7 55.2 120
## New York 18076 4903 1.4 70.55 10.9 52.7 82
## North Carolina 5441 3875 1.8 69.21 11.1 38.5 80
## North Dakota 637 5087 0.8 72.78 1.4 50.3 186
## Ohio 10735 4561 0.8 70.82 7.4 53.2 124
## Oklahoma 2715 3983 1.1 71.42 6.4 51.6 82
## Oregon 2284 4660 0.6 72.13 4.2 60.0 44
## Pennsylvania 11860 4449 1.0 70.43 6.1 50.2 126
## Rhode Island 931 4558 1.3 71.90 2.4 46.4 127
## South Carolina 2816 3635 2.3 67.96 11.6 37.8 65
## South Dakota 681 4167 0.5 72.08 1.7 53.3 172
## Tennessee 4173 3821 1.7 70.11 11.0 41.8 70
## Texas 12237 4188 2.2 70.90 12.2 47.4 35
## Utah 1203 4022 0.6 72.90 4.5 67.3 137
## Vermont 472 3907 0.6 71.64 5.5 57.1 168
## Virginia 4981 4701 1.4 70.08 9.5 47.8 85
## Washington 3559 4864 0.6 71.72 4.3 63.5 32
## West Virginia 1799 3617 1.4 69.48 6.7 41.6 100
## Wisconsin 4589 4468 0.7 72.48 3.0 54.5 149
## Wyoming 376 4566 0.6 70.29 6.9 62.9 173
## Area
## Alabama 50708
## Alaska 566432
## Arizona 113417
## Arkansas 51945
## California 156361
## Colorado 103766
## Connecticut 4862
## Delaware 1982
## Florida 54090
## Georgia 58073
## Hawaii 6425
## Idaho 82677
## Illinois 55748
## Indiana 36097
## Iowa 55941
## Kansas 81787
## Kentucky 39650
## Louisiana 44930
## Maine 30920
## Maryland 9891
## Massachusetts 7826
## Michigan 56817
## Minnesota 79289
## Mississippi 47296
## Missouri 68995
## Montana 145587
## Nebraska 76483
## Nevada 109889
## New Hampshire 9027
## New Jersey 7521
## New Mexico 121412
## New York 47831
## North Carolina 48798
## North Dakota 69273
## Ohio 40975
## Oklahoma 68782
## Oregon 96184
## Pennsylvania 44966
## Rhode Island 1049
## South Carolina 30225
## South Dakota 75955
## Tennessee 41328
## Texas 262134
## Utah 82096
## Vermont 9267
## Virginia 39780
## Washington 66570
## West Virginia 24070
## Wisconsin 54464
## Wyoming 97203
state
state.x77: matrix with 50 rows and 8 columns giving the following statistics in the respective columns.
Population: population estimate as of July 1, 1975
Income: per capita income (1974)
Illiteracy: illiteracy (1970, percent of population)
Life Exp: life expectancy in years (1969–71)
Murder: murder and non-negligent manslaughter rate per 100,000 population (1976)
HS Grad: percent high-school graduates (1970)
Frost: mean number of days with minimum temperature below freezing (1931–1960) in capital or large city
Area: land area in square miles
## Population Income Illiteracy Life Exp
## Min. : 365 Min. :3098 Min. :0.500 Min. :67.96
## 1st Qu.: 1080 1st Qu.:3993 1st Qu.:0.625 1st Qu.:70.12
## Median : 2838 Median :4519 Median :0.950 Median :70.67
## Mean : 4246 Mean :4436 Mean :1.170 Mean :70.88
## 3rd Qu.: 4968 3rd Qu.:4814 3rd Qu.:1.575 3rd Qu.:71.89
## Max. :21198 Max. :6315 Max. :2.800 Max. :73.60
## Murder HS Grad Frost Area
## Min. : 1.400 Min. :37.80 Min. : 0.00 Min. : 1049
## 1st Qu.: 4.350 1st Qu.:48.05 1st Qu.: 66.25 1st Qu.: 36985
## Median : 6.850 Median :53.25 Median :114.50 Median : 54277
## Mean : 7.378 Mean :53.11 Mean :104.46 Mean : 70736
## 3rd Qu.:10.675 3rd Qu.:59.15 3rd Qu.:139.75 3rd Qu.: 81163
## Max. :15.100 Max. :67.30 Max. :188.00 Max. :566432
## [1] "matrix"
## [1] "data.frame"
Life.Exp
##
## Call:
## lm(formula = usa$Life.Exp ~ usa$Population + usa$Income + usa$Illiteracy +
## usa$Murder + usa$HS.Grad + usa$Frost + usa$Area)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.48895 -0.51232 -0.02747 0.57002 1.49447
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.094e+01 1.748e+00 40.586 < 2e-16 ***
## usa$Population 5.180e-05 2.919e-05 1.775 0.0832 .
## usa$Income -2.180e-05 2.444e-04 -0.089 0.9293
## usa$Illiteracy 3.382e-02 3.663e-01 0.092 0.9269
## usa$Murder -3.011e-01 4.662e-02 -6.459 8.68e-08 ***
## usa$HS.Grad 4.893e-02 2.332e-02 2.098 0.0420 *
## usa$Frost -5.735e-03 3.143e-03 -1.825 0.0752 .
## usa$Area -7.383e-08 1.668e-06 -0.044 0.9649
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7448 on 42 degrees of freedom
## Multiple R-squared: 0.7362, Adjusted R-squared: 0.6922
## F-statistic: 16.74 on 7 and 42 DF, p-value: 2.534e-10
"Le critère d’information d’Akaike, (en anglais Akaike information criterion ou AIC) est une mesure de la qualité d’un modèle statistique proposée par Hirotugu Akaike en 1973.
Lorsque l’on estime un modèle statistique, il est possible d’augmenter la vraisemblance du modèle en ajoutant un paramètre. Le critère d’information d’Akaike, tout comme le critère d’information bayésien, permet de pénaliser les modèles en fonction du nombre de paramètres afin de satisfaire le critère de parcimonie. On choisit alors le modèle avec le critère d’information d’Akaike le plus faible." WIKIPEDIA
## [1] 121.7092
##
## Call:
## lm(formula = usa$Life.Exp ~ usa$Population + usa$Murder + usa$HS.Grad +
## usa$Frost)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.47095 -0.53464 -0.03701 0.57621 1.50683
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.103e+01 9.529e-01 74.542 < 2e-16 ***
## usa$Population 5.014e-05 2.512e-05 1.996 0.05201 .
## usa$Murder -3.001e-01 3.661e-02 -8.199 1.77e-10 ***
## usa$HS.Grad 4.658e-02 1.483e-02 3.142 0.00297 **
## usa$Frost -5.943e-03 2.421e-03 -2.455 0.01802 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7197 on 45 degrees of freedom
## Multiple R-squared: 0.736, Adjusted R-squared: 0.7126
## F-statistic: 31.37 on 4 and 45 DF, p-value: 1.696e-12
## [1] 115.7326
##
## Call:
## lm(formula = usa$Life.Exp ~ usa$Murder + usa$HS.Grad + usa$Frost)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5015 -0.5391 0.1014 0.5921 1.2268
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.036379 0.983262 72.246 < 2e-16 ***
## usa$Murder -0.283065 0.036731 -7.706 8.04e-10 ***
## usa$HS.Grad 0.049949 0.015201 3.286 0.00195 **
## usa$Frost -0.006912 0.002447 -2.824 0.00699 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7427 on 46 degrees of freedom
## Multiple R-squared: 0.7127, Adjusted R-squared: 0.6939
## F-statistic: 38.03 on 3 and 46 DF, p-value: 1.634e-12
## [1] 117.9743
stepwise
## Start: AIC=-22.18
## Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad +
## Frost + Area
##
## Df Sum of Sq RSS AIC
## - Area 1 0.0011 23.298 -24.182
## - Income 1 0.0044 23.302 -24.175
## - Illiteracy 1 0.0047 23.302 -24.174
## <none> 23.297 -22.185
## - Population 1 1.7472 25.044 -20.569
## - Frost 1 1.8466 25.144 -20.371
## - HS.Grad 1 2.4413 25.738 -19.202
## - Murder 1 23.1411 46.438 10.305
##
## Step: AIC=-24.18
## Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad +
## Frost
##
## Df Sum of Sq RSS AIC
## - Illiteracy 1 0.0038 23.302 -26.174
## - Income 1 0.0059 23.304 -26.170
## <none> 23.298 -24.182
## - Population 1 1.7599 25.058 -22.541
## - Frost 1 2.0488 25.347 -21.968
## - HS.Grad 1 2.9804 26.279 -20.163
## - Murder 1 26.2721 49.570 11.569
##
## Step: AIC=-26.17
## Life.Exp ~ Population + Income + Murder + HS.Grad + Frost
##
## Df Sum of Sq RSS AIC
## - Income 1 0.006 23.308 -28.161
## <none> 23.302 -26.174
## - Population 1 1.887 25.189 -24.280
## - Frost 1 3.037 26.339 -22.048
## - HS.Grad 1 3.495 26.797 -21.187
## - Murder 1 34.739 58.041 17.456
##
## Step: AIC=-28.16
## Life.Exp ~ Population + Murder + HS.Grad + Frost
##
## Df Sum of Sq RSS AIC
## <none> 23.308 -28.161
## - Population 1 2.064 25.372 -25.920
## - Frost 1 3.122 26.430 -23.877
## - HS.Grad 1 5.112 28.420 -20.246
## - Murder 1 34.816 58.124 15.528
##
## Call:
## lm(formula = Life.Exp ~ Population + Murder + HS.Grad + Frost,
## data = usa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.47095 -0.53464 -0.03701 0.57621 1.50683
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.103e+01 9.529e-01 74.542 < 2e-16 ***
## Population 5.014e-05 2.512e-05 1.996 0.05201 .
## Murder -3.001e-01 3.661e-02 -8.199 1.77e-10 ***
## HS.Grad 4.658e-02 1.483e-02 3.142 0.00297 **
## Frost -5.943e-03 2.421e-03 -2.455 0.01802 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7197 on 45 degrees of freedom
## Multiple R-squared: 0.736, Adjusted R-squared: 0.7126
## F-statistic: 31.37 on 4 and 45 DF, p-value: 1.696e-12
rbind(
min = sapply(usa[c('Population', 'Murder', 'HS.Grad', 'Frost')], min),
max = sapply(usa[c('Population', 'Murder', 'HS.Grad', 'Frost')], max))
## Population Murder HS.Grad Frost
## min 365 1.4 37.8 0
## max 21198 15.1 67.3 188
predict(mod0X,
data.frame(
Murder = 8,
HS.Grad = 55,
Frost = 80,
Population = 4250),
interval = "prediction",
level = 0.95)
## fit lwr upr
## 1 70.92559 69.45497 72.39621
## Warning: package 'car' was built under R version 3.6.1
## Loading required package: carData
## Population Murder HS.Grad Frost
## 1.189835 1.727844 1.356791 1.498077
Variance Inflation Factors (VIF) < 10 : ok !
\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_px_p + \epsilon_i\)
\(y = \sum_{j=1}^{p}\beta_jx_j+\epsilon\)
\(y\) : variable quantitative continue à expliquer.
\(x_i\) : variables quantitatives continues explicatives.
\(\epsilon\) : erreur aléatoire de loi Normale d’espérance nulle et d’écart-type \(\sigma\).
fonction lm()
résultats avec summary()
graphiques pour vérifier les hypothèses avec plot(lm())
tests statistiques comme shapiro.test()
pour la normalité des résidus
Si une (ou plusieurs) variable explicative est la combinaison linéaire d’une (ou de plusieurs) autre varaible, on parle de colinéarité. Dans ce cas, les coéficients individuels associés à chaque variable ne peuvent être interprétés de manière fiable…
set.seed(12345678)
xx <- sample(1:100, size = 100, replace = TRUE)
df <- data.frame(
sapply(
1:10,
function(i){
if(sample(c(TRUE, FALSE), size = 1)){
xx + rnorm(100, sd = 10)
}else{
xx + rnorm(100, sd = 100)
}
})
)
colnames(df) <- paste0("x", 1:10)
df$y <- 0.5 +
0.5*df$x1 + rnorm(100, mean = df$x1, sd = 10) +
0.8*df$x2 + rnorm(100, mean = df$x1, sd = 10) +
0.3*df$x3 + rnorm(100, mean = df$x1, sd = 10) +
0.5*df$x4 + rnorm(100, mean = df$x1, sd = 10) +
rnorm(100, mean = 0, sd = 150)
## x1 x2 x3 x4 x5 x6 x7
## 1 83.72785 144.56185 96.55340 98.715658 106.62437 74.93125 250.73830
## 2 73.51153 -72.77192 59.94101 56.605437 56.29103 73.70063 34.74389
## 3 14.58394 132.80571 46.19169 1.696772 25.08175 23.46682 -16.02037
## 4 19.90037 -172.43941 17.39241 4.257799 15.54725 26.53419 -61.08011
## 5 80.21417 61.82470 81.74531 73.002784 93.81761 75.59769 42.15923
## 6 98.18177 82.38916 88.62505 99.777754 102.91077 90.94779 223.50261
## x8 x9 x10 y
## 1 89.470036 96.53135 110.98130 516.62259
## 2 71.042496 55.74935 71.26404 573.29268
## 3 29.774042 29.20769 23.43577 51.15482
## 4 2.489837 10.24249 37.63406 62.85970
## 5 66.895511 59.13655 65.04702 517.36852
## 6 109.079371 89.16625 81.85138 348.68400
On devrait s’attendre à un effet significatif de x1, x2, x3 et x4.
##
## Call:
## lm(formula = y ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -272.74 -79.81 8.44 91.27 322.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.7007 31.2962 -0.246 0.806
## x1 6.1036 1.3631 4.478 2.23e-05 ***
## x2 0.8654 0.1534 5.641 1.97e-07 ***
## x3 -0.5174 1.4008 -0.369 0.713
## x4 -0.2623 1.3167 -0.199 0.843
## x5 1.2176 1.4404 0.845 0.400
## x6 0.3952 1.2640 0.313 0.755
## x7 -0.2172 0.1586 -1.369 0.174
## x8 -0.8464 1.2975 -0.652 0.516
## x9 -1.2123 1.3064 -0.928 0.356
## x10 1.1036 1.3043 0.846 0.400
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 138.3 on 89 degrees of freedom
## Multiple R-squared: 0.7085, Adjusted R-squared: 0.6757
## F-statistic: 21.63 on 10 and 89 DF, p-value: < 2.2e-16
Avec la corrélation entre variables explicatives :
## corrplot 0.84 loaded
En comparant le carré de la corrélation au R² (règle de Klein) :
## x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
## x1 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## x2 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## x3 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## x4 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## x5 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## x6 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## x7 FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## x8 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## x9 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## x10 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
Le signe de la corrélation et de l’estimateur doivent êtr eles mêmes :
## coef corr
## x1 6.1035976 0.7563715
## x2 0.8653900 0.4587661
## x3 -0.5173635 0.6405377
## x4 -0.2622878 0.6915929
## x5 1.2175900 0.7170081
## x6 0.3951513 0.6545490
## x7 -0.2171809 0.1973219
## x8 -0.8463590 0.6470706
## x9 -1.2122646 0.6401045
## x10 1.1036109 0.6688523
La façon recommandée : utiliser les facteurs d’infaltion VIF (Variance Inflation Factors)
## x1 x2 x3 x4 x5 x6 x7
## 8.199499 1.141505 8.941736 7.834481 10.060477 7.069245 1.268323
## x8 x9 x10
## 7.982893 7.522818 8.680919
## [1] "x5"
##
## Call:
## lm(formula = y ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -272.74 -79.81 8.44 91.27 322.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.7007 31.2962 -0.246 0.806
## x1 6.1036 1.3631 4.478 2.23e-05 ***
## x2 0.8654 0.1534 5.641 1.97e-07 ***
## x3 -0.5174 1.4008 -0.369 0.713
## x4 -0.2623 1.3167 -0.199 0.843
## x5 1.2176 1.4404 0.845 0.400
## x6 0.3952 1.2640 0.313 0.755
## x7 -0.2172 0.1586 -1.369 0.174
## x8 -0.8464 1.2975 -0.652 0.516
## x9 -1.2123 1.3064 -0.928 0.356
## x10 1.1036 1.3043 0.846 0.400
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 138.3 on 89 degrees of freedom
## Multiple R-squared: 0.7085, Adjusted R-squared: 0.6757
## F-statistic: 21.63 on 10 and 89 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = eval(parse(text = myForm)), data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -282.66 -85.24 2.58 96.18 328.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.2173 31.1952 -0.295 0.768
## x1 6.4607 1.2939 4.993 2.89e-06 ***
## x2 0.8863 0.1511 5.864 7.36e-08 ***
## x3 -0.3094 1.3768 -0.225 0.823
## x4 -0.2101 1.3132 -0.160 0.873
## x6 0.4844 1.2575 0.385 0.701
## x7 -0.2230 0.1582 -1.410 0.162
## x8 -0.6990 1.2837 -0.545 0.587
## x9 -1.1133 1.2990 -0.857 0.394
## x10 1.3888 1.2579 1.104 0.273
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 138.1 on 90 degrees of freedom
## Multiple R-squared: 0.7061, Adjusted R-squared: 0.6767
## F-statistic: 24.03 on 9 and 90 DF, p-value: < 2.2e-16
## Start: AIC=996.19
## y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10
##
## Df Sum of Sq RSS AIC
## - x4 1 759 1702332 994.23
## - x6 1 1869 1703442 994.30
## - x3 1 2608 1704181 994.34
## - x8 1 8135 1709708 994.67
## - x5 1 13662 1715235 994.99
## - x10 1 13687 1715261 994.99
## - x9 1 16464 1718037 995.15
## <none> 1701573 996.19
## - x7 1 35848 1737421 996.27
## - x1 1 383342 2084915 1014.51
## - x2 1 608462 2310035 1024.76
##
## Step: AIC=994.23
## y ~ x1 + x2 + x3 + x5 + x6 + x7 + x8 + x9 + x10
##
## Df Sum of Sq RSS AIC
## - x6 1 1503 1703835 992.32
## - x3 1 2229 1704561 992.36
## - x8 1 10108 1712440 992.83
## - x10 1 12940 1715271 992.99
## - x5 1 13391 1715722 993.02
## - x9 1 18322 1720654 993.30
## <none> 1702332 994.23
## - x7 1 35486 1737818 994.30
## - x1 1 391910 2094242 1012.95
## - x2 1 632154 2334486 1023.81
##
## Step: AIC=992.32
## y ~ x1 + x2 + x3 + x5 + x7 + x8 + x9 + x10
##
## Df Sum of Sq RSS AIC
## - x3 1 1466 1705301 990.41
## - x8 1 9534 1713369 990.88
## - x5 1 14375 1718210 991.16
## - x10 1 14718 1718553 991.18
## - x9 1 16902 1720737 991.31
## <none> 1703835 992.32
## - x7 1 37133 1740967 992.48
## - x1 1 398083 2101917 1011.32
## - x2 1 633046 2336881 1021.92
##
## Step: AIC=990.41
## y ~ x1 + x2 + x5 + x7 + x8 + x9 + x10
##
## Df Sum of Sq RSS AIC
## - x5 1 13110 1718411 989.17
## - x8 1 13244 1718544 989.18
## - x10 1 13446 1718746 989.19
## - x9 1 18990 1724290 989.52
## <none> 1705301 990.41
## - x7 1 38246 1743547 990.63
## - x1 1 397467 2102768 1009.36
## - x2 1 648828 2354128 1020.65
##
## Step: AIC=989.17
## y ~ x1 + x2 + x7 + x8 + x9 + x10
##
## Df Sum of Sq RSS AIC
## - x8 1 8144 1726554 987.65
## - x9 1 14615 1733025 988.02
## - x10 1 28196 1746607 988.80
## <none> 1718411 989.17
## - x7 1 40252 1758663 989.49
## - x1 1 522716 2241126 1013.73
## - x2 1 696575 2414985 1021.20
##
## Step: AIC=987.65
## y ~ x1 + x2 + x7 + x9 + x10
##
## Df Sum of Sq RSS AIC
## - x10 1 21977 1748531 986.91
## - x9 1 26034 1752588 987.14
## <none> 1726554 987.65
## - x7 1 39721 1766276 987.92
## - x1 1 533407 2259962 1012.57
## - x2 1 692286 2418841 1019.36
##
## Step: AIC=986.91
## y ~ x1 + x2 + x7 + x9
##
## Df Sum of Sq RSS AIC
## - x9 1 11623 1760154 985.57
## <none> 1748531 986.91
## - x7 1 35382 1783913 986.91
## - x2 1 675351 2423882 1017.57
## - x1 1 870633 2619164 1025.32
##
## Step: AIC=985.57
## y ~ x1 + x2 + x7
##
## Df Sum of Sq RSS AIC
## <none> 1760154 985.57
## - x7 1 46802 1806956 986.20
## - x2 1 671087 2431241 1015.87
## - x1 1 2637633 4397787 1075.14
##
## Call:
## lm(formula = y ~ x1 + x2 + x7, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -300.23 -80.57 -1.75 94.07 323.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14.6462 28.4221 -0.515 0.608
## x1 6.1569 0.5133 11.994 < 2e-16 ***
## x2 0.8624 0.1426 6.050 2.79e-08 ***
## x7 -0.2396 0.1500 -1.598 0.113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 135.4 on 96 degrees of freedom
## Multiple R-squared: 0.6984, Adjusted R-squared: 0.689
## F-statistic: 74.11 on 3 and 96 DF, p-value: < 2.2e-16
Quand des variables sont corrélées, il faut penser à une méthode pour sélectionner ses données (Cf. exemple données spatialisées en écologie).