Solution

Matplotlib Customization with National Park Visitation Data

These exercises use National Park visitation data from 1979–2024. For more context about the dataset, see the data essay.

Concepts covered:

Filtering data for a specific category
Line plots with custom colors and titles
Customizing x-axis tick intervals
Abbreviating y-axis labels (millions, thousands)
Adjusting axis limits to zoom into a time period

Load National Park Visitation data

Code

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

np_data = pd.read_csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/national-parks/US-National-Parks_RecreationVisits_1979-2024.csv")
np_data.head()

	ParkName	Region	State	Year	RecreationVisits
0	Acadia NP	Northeast	ME	1979	2787366
1	Acadia NP	Northeast	ME	1980	2779666
2	Acadia NP	Northeast	ME	1981	2997972
3	Acadia NP	Northeast	ME	1982	3572114
4	Acadia NP	Northeast	ME	1983	4124639

How have visits to a particular National Park changed over time?
What is the most interesting period of change?

Exercise 1

First, filter the dataframe for a park of your choice. Pick a National Park that you haven’t worked with yet, and filter the data for only that park.

Code

my_parks_df = np_data[np_data['ParkName'] == 'Mount Rainier NP']

my_parks_df.head()

	ParkName	Region	State	Year	RecreationVisits
1913	Mount Rainier NP	Pacific West	WA	1979	1516703
1914	Mount Rainier NP	Pacific West	WA	1980	1268256
1915	Mount Rainier NP	Pacific West	WA	1981	1233671
1916	Mount Rainier NP	Pacific West	WA	1982	1007300
1917	Mount Rainier NP	Pacific West	WA	1983	1106306

Exercise 2

Now, make a line plot that shows the number of visits per year to that park from 1979 to 2022.

2a.

Choose a color for the line.

2b.

Give the plot a title that also functions as a kind of “headline” for the most interesting story of the plot.

2c.

Change the x-axis ticks so that they increase 5 years at a time.

2d.

Change the y-axis tick labels so that they abbreviate millions to M and thousands to K.

Code

def abbreviate_number(x, pos):
    if x >= 1_000_000:
        return f'{x/1_000_000:.1f}M'
    elif x >= 1_000:
        return f'{x/1_000:.0f}K'
    return str(int(x))

fig, ax = plt.subplots()

ax.plot(my_parks_df['Year'], my_parks_df['RecreationVisits'], color='green')
ax.set_xticks(range(1980, 2025, 5))
ax.yaxis.set_major_formatter(ticker.FuncFormatter(abbreviate_number))
ax.set_ylim(0, 2_000_000)
ax.set_title('Visits to Mt. Rainier Are Surprisingly Stable')
ax.set_xlabel('Year')
ax.set_ylabel('Recreation Visits')

plt.tight_layout()
plt.show()

Exercise 3

Now, create a plot that zooms in on the most interesting time period for this particular National Park.

3a.

Change the x-axis limits so that it only shows the most interesting years.

3b.

Come up with a new title that describes this time period.

Code

fig, ax = plt.subplots()

ax.plot(my_parks_df['Year'], my_parks_df['RecreationVisits'], color='green')
ax.set_xticks(range(1980, 2025, 5))
ax.set_xlim(2005, 2023)
ax.yaxis.set_major_formatter(ticker.FuncFormatter(abbreviate_number))
ax.set_ylim(0, 2_000_000)
ax.set_title('After a COVID Dip, Mt. Rainier Visits Are Higher Than Ever')
ax.set_xlabel('Year')
ax.set_ylabel('Recreation Visits')

plt.tight_layout()
plt.show()

Matplotlib Customization with National Park Visitation Data (Solution)

Other Formats

Solution

Matplotlib Customization with National Park Visitation Data

Load National Park Visitation data

Exercise 1

Exercise 2

2a.

2b.

2c.

2d.

Exercise 3

3a.

3b.