Code
np_data <- read.csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/national-parks/US-National-Parks_RecreationVisits_1979-2024.csv", stringsAsFactors = FALSE)August 1, 2024
These exercises use National Park visitation data from 1979–2024. For more context about the dataset, see the data essay.
Concepts covered:
group_by() and summarize()What is the average number of visits for each state?
Save as avg_state_visits and then view the resulting dataframe.
Warning: 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
| State | avg_visits |
|---|---|
| AK | 141981.41 |
| AR | 1432560.80 |
| AS | 15318.90 |
| AZ | 1862047.56 |
| CA | 997825.65 |
| CO | 1061889.29 |
| FL | 478091.75 |
| HI | 1295476.11 |
| IN | 1871785.65 |
| KY | 1281422.61 |
| ME | 3001627.15 |
| MI | 20314.13 |
| MN | 221786.91 |
| MO | 2419498.65 |
| MT | 2095725.43 |
| ND | 536510.04 |
| NM | 544280.96 |
| NV | 86826.70 |
| OH | 2211137.70 |
| OR | 486658.50 |
| SC | 103145.77 |
| SD | 817187.05 |
| TN | 9822658.33 |
| TX | 254589.66 |
| UT | 1185454.81 |
| VA | 1524167.30 |
| VI | 551687.93 |
| WA | 1502220.45 |
| WV | 1055088.56 |
| WY | 2821490.58 |
Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?
What is the average number of visits for each National Park?
Save as avg_park_visits and then view the resulting dataframe.
`summarise()` has grouped output by 'ParkName'. You can override using the
`.groups` argument.
Warning: 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
| ParkName | State | avg_visits |
|---|---|---|
| Acadia NP | ME | 3001627.152 |
| Arches NP | UT | 894382.891 |
| Badlands NP | SD | 1022875.587 |
| Big Bend NP | TX | 326903.348 |
| Biscayne NP | FL | 463649.000 |
| Black Canyon of the Gunnison NP | CO | 245845.543 |
| Bryce Canyon NP | UT | 1233760.565 |
| Canyonlands NP | UT | 419784.978 |
| Capitol Reef NP | UT | 682313.717 |
| Carlsbad Caverns NP | NM | 538724.478 |
| Channel Islands NP | CA | 312260.283 |
| Congaree NP | SC | 103145.775 |
| Crater Lake NP | OR | 486658.500 |
| Cuyahoga Valley NP | OH | 2211137.696 |
| Death Valley NP | CA | 948566.935 |
| Denali NP & PRES | AK | 431212.674 |
| Dry Tortugas NP | FL | 47996.500 |
| Everglades NP | FL | 922629.739 |
| Gates of the Arctic NP & PRES | AK | 6970.744 |
| Gateway Arch NP | MO | 2419498.652 |
| Glacier Bay NP & PRES | AK | 334073.783 |
| Glacier NP | MT | 2095725.435 |
| Grand Canyon NP | AZ | 4161919.826 |
| Grand Teton NP | WY | 2509523.935 |
| Great Basin NP | NV | 86826.696 |
| Great Sand Dunes NP & PRES | CO | 298985.522 |
| Great Smoky Mountains NP | TN | 9822658.326 |
| Guadalupe Mountains NP | TX | 182275.978 |
| Haleakala NP | HI | 1179257.522 |
| Hawaii Volcanoes NP | HI | 1411694.696 |
| Hot Springs NP | AR | 1432560.804 |
| Indiana Dunes NP | IN | 1871785.652 |
| Isle Royale NP | MI | 20314.130 |
| Joshua Tree NP | CA | 1482932.804 |
| Katmai NP & PRES | AK | 40159.844 |
| Kenai Fjords NP | AK | 227288.558 |
| Kings Canyon NP | CA | 685076.304 |
| Kobuk Valley NP | AK | 6450.163 |
| Lake Clark NP & PRES | AK | 11510.023 |
| Lassen Volcanic NP | CA | 418994.109 |
| Mammoth Cave NP | KY | 1281422.609 |
| Mesa Verde NP | CO | 565156.891 |
| Mount Rainier NP | WA | 1307281.891 |
| National Park of American Samoa | AS | 15318.905 |
| New River Gorge NP & PRES | WV | 1055088.561 |
| North Cascades NP | WA | 195927.848 |
| Olympic NP | WA | 3003451.609 |
| Petrified Forest NP | AZ | 694281.109 |
| Pinnacles NP | CA | 200763.761 |
| Redwood NP | CA | 448139.022 |
| Rocky Mountain NP | CO | 3137569.196 |
| Saguaro NP | AZ | 729941.739 |
| Sequoia NP | CA | 1009114.478 |
| Shenandoah NP | VA | 1524167.304 |
| Theodore Roosevelt NP | ND | 536510.043 |
| Virgin Islands NP | VI | 551687.935 |
| Voyageurs NP | MN | 221786.913 |
| White Sands NP | NM | 549837.435 |
| Wind Cave NP | SD | 611498.522 |
| Wrangell-St. Elias NP & PRES | AK | 49340.628 |
| Yellowstone NP | WY | 3133457.217 |
| Yosemite NP | CA | 3474583.130 |
| Zion NP | UT | 2697031.913 |
Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?
How many National Parks are there in each state?
Save your answer as distinct_parks.
Warning: 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
| State | num_parks |
|---|---|
| AK | 8 |
| AR | 1 |
| AS | 1 |
| AZ | 3 |
| CA | 9 |
| CO | 4 |
| FL | 3 |
| HI | 2 |
| IN | 1 |
| KY | 1 |
| ME | 1 |
| MI | 1 |
| MN | 1 |
| MO | 1 |
| MT | 1 |
| ND | 1 |
| NM | 2 |
| NV | 1 |
| OH | 1 |
| OR | 1 |
| SC | 1 |
| SD | 2 |
| TN | 1 |
| TX | 2 |
| UT | 5 |
| VA | 1 |
| VI | 1 |
| WA | 3 |
| WV | 1 |
| WY | 2 |
Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?
---
title: "DPLYR Groupby with National Park Visitation Data (Solution)"
date: "2024-08-01"
categories: [dplyr, exercise, solution]
format:
html:
code-links:
- text: R Script
href: NP-Data-Groupby-Solutions.R
icon: file-code
code-overflow: wrap
code-fold: show
editor: visual
df-print: kable
R.options:
warn: false
code-tools: true
execute:
eval: true
---
These exercises use National Park visitation data from 1979–2024. For more context about the dataset, see the [data essay](../index.qmd).
**Concepts covered:**
- Groupby with `group_by()` and `summarize()`
- Aggregation (mean, distinct count)
- Descriptive statistics by category
# Load the data
```{r}
np_data <- read.csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/national-parks/US-National-Parks_RecreationVisits_1979-2024.csv", stringsAsFactors = FALSE)
```
# Load dplyr library
```{r warning="ignore"}
library("dplyr")
```
# Exercise 1
What is the average number of visits for *each state*?
Save as `avg_state_visits` and then view the resulting dataframe.
```{r}
avg_state_visits <- np_data %>%
group_by(State) %>%
summarize(avg_visits = mean(RecreationVisits))
avg_state_visits
```
Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?
# Exercise 2
What is the average number of visits for *each National Park*?
Save as `avg_park_visits` and then view the resulting dataframe.
```{r}
avg_park_visits <- np_data %>%
group_by(ParkName, State) %>%
summarize(avg_visits = mean(RecreationVisits))
avg_park_visits
```
Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?
# Exercise 3:
How many National Parks are there in *each state*?
Save your answer as `distinct_parks`.
```{r}
distinct_parks <- np_data %>%
group_by(State) %>%
summarize(num_parks = n_distinct(ParkName))
distinct_parks
```
Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?