These exercises use data about U.S. state governors. For more context about the dataset, see the data essay.

Concepts covered:

Value counts (counting categorical values)
Sorting and ranking
Binning continuous variables into categories
Filtering for outliers

Load the data

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/refs/heads/main/datasets/gubernatorial-bios/gubernatorial_bios_final.csv")

Exercise 1

What are the top 5 birth states for governors?

Save as top_birth_states and then view the resulting dataframe.

Your code here

Discuss/consider: What states are the most common birth spots for governors? Are there similar traits between the top states?

Exercise 2

Let’s now look at the initial age for governors and the distribution. We already visualized the average starting age in the main data essay, but let’s look at the distribution across the entire dataframe

First, let’s cut the age_at_start variable into bins, so people between ages 20 and 30 get mapped to 20, 30 and 40 to 30 etc. Let’s call this new column age_bucket. We will use the Pandas function pd.cut.

Now let’s display the value_counts of the age_bucket column and let’s sort by the age.

Your code here

Discuss/consider: Which age range has the most governors?

Exercise 3:

Let’s look at the outlier governors who started office at or below the age of 30. Let’s filter the dataset down to this set by filtering on the age_at_start column, call it young_governors and display it.

Your code here

Discuss/consider: Who are the youngest governors in American History? What era are they from? Take a look at some of their Wikipedia pages and NGA biographies online.

Pandas Value Counts with Gubernatorial Data (Exercise)

Other Formats

Load the data

Exercise 1

Exercise 2

Exercise 3: