Code
library(dplyr)
df <- read.csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/refs/heads/main/datasets/gubernatorial-bios/gubernatorial_bios_final.csv",
stringsAsFactors = FALSE)
head(df)August 1, 2024
These exercises use data about U.S. state governors. For more context about the dataset, see the data essay.
Concepts covered:
count()arrange()cut()filter()What are the top 5 birth states for governors?
Save as top_birth_states and then view the resulting dataframe.
Discuss/consider: What states are the most common birth spots for governors? Are there similar traits between the top states?
Let’s now look at the initial age for governors and the distribution. We already visualized the average starting age in the main data essay, but let’s look at the distribution across the entire dataframe.
First, let’s cut the age_at_start variable into bins, so people between ages 20 and 30 get mapped to the “(20,30]” bucket, 30 and 40 to “(30,40]”, etc. Let’s call this new column age_bucket. We will use the R function cut().
Now let’s display the counts of the age_bucket column and sort by the age bucket.
Discuss/consider: Which age range has the most governors?
Let’s look at the outlier governors who started office at or below the age of 30. Let’s filter the dataset down to this set by filtering on the age_at_start column, call it young_governors and display it.
Discuss/consider: Who are the youngest governors in American History? What era are they from? Take a look at some of their Wikipedia pages and NGA biographies online.
---
title: "DPLYR Value Counts with Gubernatorial Data (Exercise)"
date: "2024-08-01"
categories: [dplyr, exercise]
format:
html: default
code-overflow: wrap
code-fold: show
editor: visual
df-print: kable
R.options:
warn: false
code-tools: true
execute:
eval: false
---
# <span style="color:green;"> Exercises </span>
## DPLYR Value Counts with Gubernatorial Data
<span style="color:red;"> [Solutions](Gubernatorial-Data-Value-Counts-Solutions.qmd) </span>
These exercises use data about U.S. state governors. For more context about the dataset, see the [data essay](../index.qmd).
**Concepts covered:**
- Counting categorical values with `count()`
- Sorting and ranking with `arrange()`
- Binning continuous variables with `cut()`
- Filtering for outliers with `filter()`
# Load the data
```{r}
#| message: false
library(dplyr)
df <- read.csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/refs/heads/main/datasets/gubernatorial-bios/gubernatorial_bios_final.csv",
stringsAsFactors = FALSE)
head(df)
```
# Exercise 1
What are the top 5 birth states for governors?
Save as `top_birth_states` and then view the resulting dataframe.
```{r}
# Your code here
```
Discuss/consider: What states are the most common birth spots for governors? Are there similar traits between the top states?
# Exercise 2
Let's now look at the initial age for governors and the distribution. We already visualized the average starting age in the main data essay, but let's look at the distribution across the entire dataframe.
First, let's cut the `age_at_start` variable into bins, so people between ages 20 and 30 get mapped to the "(20,30]" bucket, 30 and 40 to "(30,40]", etc. Let's call this new column `age_bucket`. We will use the R function `cut()`.
Now let's display the counts of the `age_bucket` column and sort by the age bucket.
```{r}
# Your code here
```
Discuss/consider: Which age range has the most governors?
# Exercise 3:
Let's look at the outlier governors who started office at or below the age of 30. Let's filter the dataset down to this set by filtering on the `age_at_start` column, call it `young_governors` and display it.
```{r}
# Your code here
```
Discuss/consider: Who are the youngest governors in American History? What era are they from? Take a look at some of their Wikipedia pages and NGA biographies online.