Today we’re gonna learn the fundamentals of regression models using lm(), predict(), and the iris data set.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
model <- lm(Sepal.Length ~ Sepal.Width, data = iris)
ggplot(iris, aes(y = Sepal.Length, x = Sepal.Width)) +
geom_point() +
geom_smooth(
se = F,
method = "lm"
) #+
## `geom_smooth()` using formula = 'y ~ x'
#scale_x_continuous(limits = c(0, 10)) +
#scale_y_continuous(limits = c(0, 10)) +
#geom_line(data = prediction, aes(x = x, y = y))
If we extend the line…
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 38 rows containing missing values or values outside the scale range
## (`geom_line()`).
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 36 rows containing missing values or values outside the scale range
## (`geom_line()`).
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 30 rows containing missing values or values outside the scale range
## (`geom_line()`).
## `geom_smooth()` using formula = 'y ~ x'
…and run the model
outputs <- model %>% predict(newdata = tibble(Sepal.Width = c(200)))
outputs
## 1
## -38.14599
Now we know that if the sepal width was 200, the sepal length would be -38. What an amazing scientific discovery!