Creating Linear Regression

Today we’re gonna learn the fundamentals of regression models using lm(), predict(), and the iris data set.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

model <- lm(Sepal.Length ~ Sepal.Width, data = iris)

ggplot(iris, aes(y = Sepal.Length, x = Sepal.Width)) +
  geom_point() +
  geom_smooth(
    se = F,
    method = "lm"
  ) #+

## `geom_smooth()` using formula = 'y ~ x'

  #scale_x_continuous(limits = c(0, 10)) +
  #scale_y_continuous(limits = c(0, 10)) +
  #geom_line(data = prediction, aes(x = x, y = y))

If we extend the line…

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 38 rows containing missing values or values outside the scale range
## (`geom_line()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 36 rows containing missing values or values outside the scale range
## (`geom_line()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 30 rows containing missing values or values outside the scale range
## (`geom_line()`).

## `geom_smooth()` using formula = 'y ~ x'

…and run the model

outputs <- model %>% predict(newdata = tibble(Sepal.Width = c(200)))
outputs

##         1 
## -38.14599

Now we know that if the sepal width was 200, the sepal length would be -38. What an amazing scientific discovery!