DPLYR Groupby with National Park Visitation Data (Solution)

dplyr
exercise
solution
Published

August 1, 2024

These exercises use National Park visitation data from 1979–2024. For more context about the dataset, see the data essay.

Concepts covered:

Load the data

Code
np_data <- read.csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/national-parks/US-National-Parks_RecreationVisits_1979-2024.csv", stringsAsFactors = FALSE)

Load dplyr library

Code
library("dplyr")

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Exercise 1

What is the average number of visits for each state?

Save as avg_state_visits and then view the resulting dataframe.

Code
avg_state_visits <- np_data %>%
                    group_by(State) %>%
                    summarize(avg_visits = mean(RecreationVisits))

avg_state_visits
Warning: 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
State avg_visits
AK 141981.41
AR 1432560.80
AS 15318.90
AZ 1862047.56
CA 997825.65
CO 1061889.29
FL 478091.75
HI 1295476.11
IN 1871785.65
KY 1281422.61
ME 3001627.15
MI 20314.13
MN 221786.91
MO 2419498.65
MT 2095725.43
ND 536510.04
NM 544280.96
NV 86826.70
OH 2211137.70
OR 486658.50
SC 103145.77
SD 817187.05
TN 9822658.33
TX 254589.66
UT 1185454.81
VA 1524167.30
VI 551687.93
WA 1502220.45
WV 1055088.56
WY 2821490.58

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

Exercise 2

What is the average number of visits for each National Park?

Save as avg_park_visits and then view the resulting dataframe.

Code
avg_park_visits <- np_data %>%
                    group_by(ParkName, State) %>%
                    summarize(avg_visits = mean(RecreationVisits))
`summarise()` has grouped output by 'ParkName'. You can override using the
`.groups` argument.
Code
avg_park_visits
Warning: 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
ParkName State avg_visits
Acadia NP ME 3001627.152
Arches NP UT 894382.891
Badlands NP SD 1022875.587
Big Bend NP TX 326903.348
Biscayne NP FL 463649.000
Black Canyon of the Gunnison NP CO 245845.543
Bryce Canyon NP UT 1233760.565
Canyonlands NP UT 419784.978
Capitol Reef NP UT 682313.717
Carlsbad Caverns NP NM 538724.478
Channel Islands NP CA 312260.283
Congaree NP SC 103145.775
Crater Lake NP OR 486658.500
Cuyahoga Valley NP OH 2211137.696
Death Valley NP CA 948566.935
Denali NP & PRES AK 431212.674
Dry Tortugas NP FL 47996.500
Everglades NP FL 922629.739
Gates of the Arctic NP & PRES AK 6970.744
Gateway Arch NP MO 2419498.652
Glacier Bay NP & PRES AK 334073.783
Glacier NP MT 2095725.435
Grand Canyon NP AZ 4161919.826
Grand Teton NP WY 2509523.935
Great Basin NP NV 86826.696
Great Sand Dunes NP & PRES CO 298985.522
Great Smoky Mountains NP TN 9822658.326
Guadalupe Mountains NP TX 182275.978
Haleakala NP HI 1179257.522
Hawaii Volcanoes NP HI 1411694.696
Hot Springs NP AR 1432560.804
Indiana Dunes NP IN 1871785.652
Isle Royale NP MI 20314.130
Joshua Tree NP CA 1482932.804
Katmai NP & PRES AK 40159.844
Kenai Fjords NP AK 227288.558
Kings Canyon NP CA 685076.304
Kobuk Valley NP AK 6450.163
Lake Clark NP & PRES AK 11510.023
Lassen Volcanic NP CA 418994.109
Mammoth Cave NP KY 1281422.609
Mesa Verde NP CO 565156.891
Mount Rainier NP WA 1307281.891
National Park of American Samoa AS 15318.905
New River Gorge NP & PRES WV 1055088.561
North Cascades NP WA 195927.848
Olympic NP WA 3003451.609
Petrified Forest NP AZ 694281.109
Pinnacles NP CA 200763.761
Redwood NP CA 448139.022
Rocky Mountain NP CO 3137569.196
Saguaro NP AZ 729941.739
Sequoia NP CA 1009114.478
Shenandoah NP VA 1524167.304
Theodore Roosevelt NP ND 536510.043
Virgin Islands NP VI 551687.935
Voyageurs NP MN 221786.913
White Sands NP NM 549837.435
Wind Cave NP SD 611498.522
Wrangell-St. Elias NP & PRES AK 49340.628
Yellowstone NP WY 3133457.217
Yosemite NP CA 3474583.130
Zion NP UT 2697031.913

Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?

Exercise 3:

How many National Parks are there in each state?

Save your answer as distinct_parks.

Code
distinct_parks <- np_data %>%
                    group_by(State) %>%
                    summarize(num_parks = n_distinct(ParkName))

distinct_parks
Warning: 'xfun::attr()' is deprecated.
Use 'xfun::attr2()' instead.
See help("Deprecated")
State num_parks
AK 8
AR 1
AS 1
AZ 3
CA 9
CO 4
FL 3
HI 2
IN 1
KY 1
ME 1
MI 1
MN 1
MO 1
MT 1
ND 1
NM 2
NV 1
OH 1
OR 1
SC 1
SD 2
TN 1
TX 2
UT 5
VA 1
VI 1
WA 3
WV 1
WY 2

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?