Linear regression with R

Jun 12, 2018 00:00 · 948 words · 5 minute read inferential statistics

Estimating linear regression

Before starting, I am going to load a number of utility libraries for this session.

library(tidyverse)
library(broom)
library(plotly)

A simple linear regression using mtcars dataset. I want to look at how weight of a car is associated with how many miles a car travels per gallon of gasoline–an indicator of fuel economy. I am going to look at their summary and scatter plot. For scatter plot, I use base-R’s plot() function.

summary(mtcars[, c("wt", "mpg")])
##        wt             mpg       
##  Min.   :1.513   Min.   :10.40  
##  1st Qu.:2.581   1st Qu.:15.43  
##  Median :3.325   Median :19.20  
##  Mean   :3.217   Mean   :20.09  
##  3rd Qu.:3.610   3rd Qu.:22.80  
##  Max.   :5.424   Max.   :33.90
plot(mtcars$wt, mtcars$mpg)

It seems like heavier cars have lower mileage Next, I regress mpg on weight using linear regression.

lm(mpg ~ wt, data = mtcars)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Coefficients:
## (Intercept)           wt  
##      37.285       -5.344

The output from lm() function quantifies a negative relationship between weight and mileage. The output however does not provide other useful information such as confidence interval and r-squared, which can be accessed if we feed the lm function into summary() function.

summary(lm(mpg ~ wt, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Furthermore, augment() from broom package allows you to access the fitted values and residuals of this regression model, in the form of a data.frame.

broom::augment(lm(mpg ~ wt, data = mtcars)) %>% head()
##           .rownames  mpg    wt  .fitted   .se.fit     .resid       .hat
## 1         Mazda RX4 21.0 2.620 23.28261 0.6335798 -2.2826106 0.04326896
## 2     Mazda RX4 Wag 21.0 2.875 21.91977 0.5714319 -0.9197704 0.03519677
## 3        Datsun 710 22.8 2.320 24.88595 0.7359177 -2.0859521 0.05837573
## 4    Hornet 4 Drive 21.4 3.215 20.10265 0.5384424  1.2973499 0.03125017
## 5 Hornet Sportabout 18.7 3.440 18.90014 0.5526562 -0.2001440 0.03292182
## 6           Valiant 18.1 3.460 18.79325 0.5552829 -0.6932545 0.03323551
##     .sigma      .cooksd  .std.resid
## 1 3.067494 1.327407e-02 -0.76616765
## 2 3.093068 1.723963e-03 -0.30743051
## 3 3.072127 1.543937e-02 -0.70575249
## 4 3.088268 3.020558e-03  0.43275114
## 5 3.097722 7.599578e-05 -0.06681879
## 6 3.095184 9.210650e-04 -0.23148309

There is also tidy() function from broom package that makes a “tidy” data.frame of the regression estimates.

broom::tidy(lm(mpg ~ wt, data = mtcars)) %>% head()
##          term  estimate std.error statistic      p.value
## 1 (Intercept) 37.285126  1.877627 19.857575 8.241799e-19
## 2          wt -5.344472  0.559101 -9.559044 1.293959e-10

Visualizing linear regression

The best way to visualize a simple linear regression is a scatter plot and a fitted regression line. For a simple linear regression, a 2-dimensional plot and a fitted line suffices. Here I use the ggplot2’s ggplot() function which is made available through the tidyverse package.

ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point() + geom_smooth(method = "lm", se = F) + theme_bw()

Visualizing multiple regression becomes more challenging, as each variable in the model has to have a dimension. Therefore, as it is not possible to visualize more than 3 dimensions, a model with two right-hand-side variables can be visualized at most. Here, I am plotting weight, mpg, and horsepower using an plotly package that creates interactive plots.

plotly::plot_ly(data = mtcars, x = ~wt, y = ~mpg, z = ~hp)
tweet Share