DPLYR Value Counts with Gubernatorial Data (Exercise)

dplyr
exercise
Published

August 1, 2024

Exercises

DPLYR Value Counts with Gubernatorial Data

Solutions

These exercises use data about U.S. state governors. For more context about the dataset, see the data essay.

Concepts covered:

  • Counting categorical values with count()
  • Sorting and ranking with arrange()
  • Binning continuous variables with cut()
  • Filtering for outliers with filter()

Load the data

Code
library(dplyr)

df <- read.csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/refs/heads/main/datasets/gubernatorial-bios/gubernatorial_bios_final.csv",
  stringsAsFactors = FALSE)
head(df)

Exercise 1

What are the top 5 birth states for governors?

Save as top_birth_states and then view the resulting dataframe.

Code
# Your code here

Discuss/consider: What states are the most common birth spots for governors? Are there similar traits between the top states?

Exercise 2

Let’s now look at the initial age for governors and the distribution. We already visualized the average starting age in the main data essay, but let’s look at the distribution across the entire dataframe.

First, let’s cut the age_at_start variable into bins, so people between ages 20 and 30 get mapped to the “(20,30]” bucket, 30 and 40 to “(30,40]”, etc. Let’s call this new column age_bucket. We will use the R function cut().

Now let’s display the counts of the age_bucket column and sort by the age bucket.

Code
# Your code here

Discuss/consider: Which age range has the most governors?

Exercise 3:

Let’s look at the outlier governors who started office at or below the age of 30. Let’s filter the dataset down to this set by filtering on the age_at_start column, call it young_governors and display it.

Code
# Your code here

Discuss/consider: Who are the youngest governors in American History? What era are they from? Take a look at some of their Wikipedia pages and NGA biographies online.