Pandas Groupby with National Park Visitation Data (Solution)

pandas
exercise
solution
Published

August 1, 2024

These exercises use National Park visitation data from 1979–2024. For more context about the dataset, see the data essay.

Concepts covered:

Load the data

Code
import pandas as pd

np_data = pd.read_csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/national-parks/US-National-Parks_RecreationVisits_1979-2024.csv")

Exercise 1

What is the average number of visits for each state?

Save as avg_state_visits and then view the resulting dataframe.

Code
avg_state_visits = np_data.groupby('State')['RecreationVisits'].mean().reset_index()
avg_state_visits
State RecreationVisits
0 AK 1.419814e+05
1 AR 1.432561e+06
2 AS 1.531890e+04
3 AZ 1.862048e+06
4 CA 9.978256e+05
5 CO 1.061889e+06
6 FL 4.780917e+05
7 HI 1.295476e+06
8 IN 1.871786e+06
9 KY 1.281423e+06
10 ME 3.001627e+06
11 MI 2.031413e+04
12 MN 2.217869e+05
13 MO 2.419499e+06
14 MT 2.095725e+06
15 ND 5.365100e+05
16 NM 5.442810e+05
17 NV 8.682670e+04
18 OH 2.211138e+06
19 OR 4.866585e+05
20 SC 1.031458e+05
21 SD 8.171871e+05
22 TN 9.822658e+06
23 TX 2.545897e+05
24 UT 1.185455e+06
25 VA 1.524167e+06
26 VI 5.516879e+05
27 WA 1.502220e+06
28 WV 1.055089e+06
29 WY 2.821491e+06

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?

Exercise 2

What is the average number of visits for each National Park?

Save as avg_park_visits and then view the resulting dataframe.

Code
avg_park_visits = np_data.groupby('ParkName')['RecreationVisits'].mean().reset_index()
avg_park_visits
ParkName RecreationVisits
0 Acadia NP 3.001627e+06
1 Arches NP 8.943829e+05
2 Badlands NP 1.022876e+06
3 Big Bend NP 3.269033e+05
4 Biscayne NP 4.636490e+05
... ... ...
58 Wind Cave NP 6.114985e+05
59 Wrangell-St. Elias NP & PRES 4.934063e+04
60 Yellowstone NP 3.133457e+06
61 Yosemite NP 3.474583e+06
62 Zion NP 2.697032e+06

63 rows × 2 columns

Discuss/consider: Which National Park has the most and least average visits? What patterns or surprises do you notice?

Exercise 3:

How many National Parks are there in each state?

Save your answer as distinct_parks.

Code
distinct_parks = np_data.groupby('State')['ParkName'].nunique().reset_index(name='NumParks')
distinct_parks
State NumParks
0 AK 8
1 AR 1
2 AS 1
3 AZ 3
4 CA 9
5 CO 4
6 FL 3
7 HI 2
8 IN 1
9 KY 1
10 ME 1
11 MI 1
12 MN 1
13 MO 1
14 MT 1
15 ND 1
16 NM 2
17 NV 1
18 OH 1
19 OR 1
20 SC 1
21 SD 2
22 TN 1
23 TX 2
24 UT 5
25 VA 1
26 VI 1
27 WA 3
28 WV 1
29 WY 2

Discuss/consider: What state has the most and least average visits? What patterns or surprises do you notice?