Introduction to Pandas with National Park Visitation Data (Solution)

pandas
exercise
solution
Published

February 26, 2024

Solution

Introduction to Pandas with National Park Visitation Data

Exercise Without Solutions

These exercises use National Park visitation data from 1979–2024. For more context about the dataset, see the data essay.

Concepts covered:

  • Selecting columns
  • Filtering rows by a condition
  • Aggregation (sum)
  • Comparing summary statistics across groups

Load National Park Visitation data

Code
import pandas as pd

np_data = pd.read_csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/national-parks/US-National-Parks_RecreationVisits_1979-2024.csv")
np_data.head()
ParkName Region State Year RecreationVisits
0 Acadia NP Northeast ME 1979 2787366
1 Acadia NP Northeast ME 1980 2779666
2 Acadia NP Northeast ME 1981 2997972
3 Acadia NP Northeast ME 1982 3572114
4 Acadia NP Northeast ME 1983 4124639

Exercise 1

Select 2 columns from the data. Save this 2-column dataframe to the variable smaller_df.

Code
smaller_df = np_data[['Year', 'RecreationVisits']]

smaller_df.head()
Year RecreationVisits
0 1979 2787366
1 1980 2779666
2 1981 2997972
3 1982 3572114
4 1983 4124639

Question: How does the number of visits to Washington national parks compare to another state?

Exercise 2

Filter the dataframe for only values in the state of Washington and save to the variable wa_parks.

Code
wa_parks = np_data[np_data['State'] == 'WA']

wa_parks.head()
ParkName Region State Year RecreationVisits
1913 Mount Rainier NP Pacific West WA 1979 1516703
1914 Mount Rainier NP Pacific West WA 1980 1268256
1915 Mount Rainier NP Pacific West WA 1981 1233671
1916 Mount Rainier NP Pacific West WA 1982 1007300
1917 Mount Rainier NP Pacific West WA 1983 1106306

Exercise 3

Calculate the sum total of RecreationVisits to Washington by using .sum() on the smaller dataframe wa_parks.

Code
wa_parks['RecreationVisits'].sum()
np.int64(207306422)

Exercise 4

Filter the dataframe for only values in another state (your choice) and save to a variable. Calculate the sum total of RecreationVisits to this state by using .sum().

Code
ca_parks = np_data[np_data['State'] == 'CA']
ca_parks['RecreationVisits'].sum()
np.int64(413099818)

Question: How do the number of visits to these 2 states compare to one another?

Code
wa_parks['RecreationVisits'].sum() - ca_parks['RecreationVisits'].sum()
np.int64(-205793396)