# Function to download and extract shapefiles
<- function(year) {
download_shapefiles <- "https://cdmaps.polisci.ucla.edu/shp/districts"
base_url <- paste0("cd", year, ".zip")
file_name <- paste0(base_url, "/", file_name)
file_url <- paste0("~/STA9750-2024-FALL/", file_name)
dest_file
# Check if file already exists
if (!file.exists(dest_file)) {
# Create directory if it doesn't exist
if (!dir.exists("~/STA9750-2024-FALL")) {
dir.create("~/STA9750-2024-FALL", recursive = TRUE)
}
# Download the file
GET(file_url, write_disk(dest_file, overwrite = TRUE))
# Unzip the file
unzip(dest_file, exdir = "~/STA9750-2024-FALL")
}
}
# Download shapefiles for the years 1976 to 2012
<- seq(1976, 2012, by = 2)
years for (year in years) {
download_shapefiles(year)
}
Mini-Project 3: Do Proportional Electoral College Allocations Yield a More Representative Presidency?
[from the BBC website]
In this project, we’ll explore historical congressional election data to analyze how US presidential election outcomes might differ under alternative allocation rules. While this “retro-diction” exercise has limitations - since different rules could have led to altered campaigns and voting patterns - it aims to provide meaningful insights into potential biases within the electoral system.
Loading the necessary Libraries for our project
Data : US House Election Votes & Congressional Boundary Files from 1976 to 2022
Downloading the MIT Election Data Science Lab that provides comprehensive data on all biennial congressional races across all 50 states, spanning from 1976 to 2022. I additionally, downloaded statewide presidential vote counts from 1976 to 2022. Plus, Jeffrey B. Lewis, Brandon DeVine, Lincoln Pritcher, and Kenneth C. Martis have created shapefiles for all US congressional districts from 1789 to 2012.
Task 1: Downloading congressional shapefiles from Lewis et al. for all US Congresses5 from 1976 to 2012.
Task 2: Downloading congressional shapefiles from the US Census Bureau for all US Congresses from 2014 to 2022.
# Base FTP URL for US Census Bureau Congressional District shapefiles
<- "https://www2.census.gov/geo/tiger/TIGER"
base_ftp_url
# Function to download and extract shapefiles
<- function(year) {
download_shapefiles <- paste0("tl_", year, "_us_cd116.zip")
file_name <- paste0(base_ftp_url, year, "/CD/", file_name)
file_url <- paste0("~/STA9750-2024-FALL/", file_name)
dest_file
# Check if file already exists
if (!file.exists(dest_file)) {
# Create directory if it doesn't exist
if (!dir.exists("~/STA9750-2024-FALL")) {
dir.create("~/STA9750-2024-FALL", recursive = TRUE)
}
# Download the file
GET(file_url, write_disk(dest_file, overwrite = TRUE))
# Unzip the file
unzip(dest_file, exdir = "~/STA9750-2024-FALL")
}
}
# Download shapefiles for the years 2014 to 2022
<- seq(2014, 2022, by = 2)
years for (year in years) {
download_shapefiles(year)
}
Task 3: Exploration of Vote Count Data
Answering the following using the vote count data files from the MIT Election Data Science Lab with a plot or a graph.
Which states have gained and lost the most seats in the US House of Representatives between 1976 and 2022 ?
# File paths to the manually downloaded files
<- "~/STA9750-2024-FALL/1976-2022-house.csv"
house_votes_dest <- "~/STA9750-2024-FALL/1976-2020-president.csv"
presidential_votes_dest
# Load the data
<- read_csv(house_votes_dest)
house_votes <- read_csv(presidential_votes_dest) presidential_votes
# Summarize the number of seats per state for each election year
<- house_votes %>%
seats_by_state group_by(year, state) %>%
summarize(seats = n_distinct(district), .groups = 'drop')
# Calculate the change in seats from 1976 to 2022
<- seats_by_state %>%
seats_change filter(year %in% c(1976, 2022)) %>%
pivot_wider(names_from = year, values_from = seats, names_prefix = "year_") %>%
mutate(change = year_2022 - year_1976) %>%
arrange(desc(change))
# Filter the top 10 states with the largest absolute change in seats
<- seats_change %>%
top_10_seats_change mutate(abs_change = abs(change)) %>%
arrange(desc(abs_change)) %>%
slice(1:10)
# Create the heatmap with the top 10 states
ggplot(top_10_seats_change, aes(x = 1, y = reorder(state, change), fill = change)) +
geom_tile() +
scale_fill_gradient2(low = "red", high = "green", mid = "white", midpoint = 0) +
labs(title = "Top 10 States by Change in US House Seats (1976-2022)",
x = "", y = "State", fill = "Seat Change") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "top"
)
# Display the table of changes
print(seats_change)
# A tibble: 50 × 4
state year_1976 year_2022 change
<chr> <int> <int> <int>
1 TEXAS 24 38 14
2 FLORIDA 15 28 13
3 CALIFORNIA 43 52 9
4 ARIZONA 4 9 5
5 GEORGIA 10 14 4
6 COLORADO 5 8 3
7 NEVADA 1 4 3
8 NORTH CAROLINA 11 14 3
9 WASHINGTON 7 10 3
10 OREGON 4 6 2
# ℹ 40 more rows
This Heatmap displays the top 10 states that have lost and gained the most seat from 1976 to 2022. Texas +14, Florida +13, California +9 are among the States that have gained the most seats.
Next, let’s address the second question about the impact of New York State’s “fusion” voting system. We’ll need to identify elections where the outcome would have been different without the fusion system.
The first step is to identify Elections with Fusion Voting. We’ll filter the data for New York State and identify elections where candidates appeared on multiple party lines, summarize total votes by candidate and district (including fusion votes). Then, summarize only major party votes (Democrat or Republican lines). After that, merge total votes (fusion) with major party votes to identify the fusion winner and major party-only winner. Last, we will filter to find elections where the outcome would change without fusion votes and display the result.
# Inspect column names for house_votes
colnames(house_votes)
[1] "year" "state" "state_po" "state_fips"
[5] "state_cen" "state_ic" "office" "district"
[9] "stage" "runoff" "special" "candidate"
[13] "party" "writein" "mode" "candidatevotes"
[17] "totalvotes" "unofficial" "version" "fusion_ticket"
# Inspect column names for presidential_votes
colnames(presidential_votes)
[1] "year" "state" "state_po" "state_fips"
[5] "state_cen" "state_ic" "office" "candidate"
[9] "party_detailed" "writein" "candidatevotes" "totalvotes"
[13] "version" "notes" "party_simplified"
# Step 1: Filter New York State elections
<- house_votes %>%
ny_votes filter(state == "NY")
# Step 2:
<- ny_votes %>%
total_votes group_by(year, district, candidate) %>%
summarize(fusion_total_votes = sum(candidatevotes), .groups = 'drop')
# Step 3:
<- ny_votes %>%
major_party_votes filter(party %in% c("DEMOCRAT", "REPUBLICAN")) %>%
group_by(year, district, candidate) %>%
summarize(major_party_total_votes = sum(candidatevotes), .groups = 'drop')
# Step 4:
<- total_votes %>%
election_outcomes left_join(major_party_votes, by = c("year", "district", "candidate")) %>%
mutate(
major_party_total_votes = replace_na(major_party_total_votes, 0) # Set to 0 if no major party votes
)
# Step 5:
<- election_outcomes %>%
election_outcomes group_by(year, district) %>%
mutate(
winner_fusion = candidate[which.max(fusion_total_votes)],
winner_major_party = candidate[which.max(major_party_total_votes)]
%>%
) ungroup()
# Step 6:
<- election_outcomes %>%
different_outcomes filter(winner_fusion != winner_major_party)
# Display the table of different outcomes
print(different_outcomes)
# A tibble: 0 × 7
# ℹ 7 variables: year <dbl>, district <dbl>, candidate <chr>,
# fusion_total_votes <dbl>, major_party_total_votes <dbl>,
# winner_fusion <chr>, winner_major_party <chr>
# Filter data for New York State
<- house_votes %>%
ny_votes filter(state == "NY")
# Summarize total votes by candidate and party
<- ny_votes %>%
candidate_votes group_by(year, district, candidate, party) %>%
summarize(total_votes = sum(candidatevotes), .groups = 'drop')
# Identify votes from major parties
<- candidate_votes %>%
major_party_votes filter(party %in% c("DEMOCRAT", "REPUBLICAN")) %>%
group_by(year, district, candidate) %>%
summarize(major_party_votes = sum(total_votes), .groups = 'drop')
# Compare total votes with major party votes
<- candidate_votes %>%
election_outcomes group_by(year, district, candidate) %>%
summarize(total_votes = sum(total_votes), .groups = 'drop') %>%
left_join(major_party_votes, by = c("year", "district", "candidate")) %>%
mutate(major_party_votes = replace_na(major_party_votes, 0)) %>%
group_by(year, district) %>%
mutate(winner_total = candidate[which.max(total_votes)],
winner_major_party = candidate[which(major_party_votes == max(major_party_votes, na.rm = TRUE))]) %>%
ungroup() %>%
filter(winner_total != winner_major_party)
# Display the table of different outcomes
print(election_outcomes)
# A tibble: 0 × 7
# ℹ 7 variables: year <dbl>, district <dbl>, candidate <chr>,
# total_votes <dbl>, major_party_votes <dbl>, winner_total <chr>,
# winner_major_party <chr>
It seems that the fusion voting system in New York State didn’t significantly alter the outcomes of the elections in our data set. This could mean that even without combining votes from multiple party lines, the same candidates would have won. However, let’s ensure we’ve covered all bases. Here’s a final approach to verify:
# Filter data for New York State
<- house_votes %>%
ny_votes filter(state == "NY")
# Summarize total votes by candidate and district
<- ny_votes %>%
total_votes_by_candidate group_by(year, district, candidate) %>%
summarize(total_votes = sum(candidatevotes, na.rm = TRUE), .groups = 'drop')
# Summarize major party votes by candidate and district
<- ny_votes %>%
major_party_votes_by_candidate filter(party %in% c("DEMOCRAT", "REPUBLICAN")) %>%
group_by(year, district, candidate) %>%
summarize(major_party_votes = sum(candidatevotes, na.rm = TRUE), .groups = 'drop')
# Merge total votes and major party votes
<- total_votes_by_candidate %>%
merged_votes left_join(major_party_votes_by_candidate, by = c("year", "district", "candidate"))
# Identify winners by total votes and major party votes
<- merged_votes %>%
election_results group_by(year, district) %>%
mutate(winner_total = candidate[which.max(total_votes)],
winner_major_party = candidate[which.max(major_party_votes)]) %>%
ungroup() %>%
filter(winner_total != winner_major_party)
# Display the table of different outcomes
print(election_results)
# A tibble: 0 × 7
# ℹ 7 variables: year <dbl>, district <dbl>, candidate <chr>,
# total_votes <dbl>, major_party_votes <dbl>, winner_total <chr>,
# winner_major_party <chr>
This approach still yields no results, it confirms that the fusion voting system did not alter the election outcomes in our dataset.
For the third question, we’ll analyze the data to determine if presidential candidates tend to run ahead of or behind congressional candidates in the same state.
# Summarize presidential votes by state and year
<- presidential_votes %>%
pres_votes_by_state filter(party_simplified == "DEMOCRAT") %>%
group_by(year, state) %>%
summarize(pres_votes = sum(candidatevotes), .groups = 'drop')
# Summarize congressional votes by state and year
<- house_votes %>%
cong_votes_by_state filter(party == "DEMOCRAT") %>%
group_by(year, state) %>%
summarize(cong_votes = sum(candidatevotes), .groups = 'drop')
# Compare presidential and congressional votes
<- pres_votes_by_state %>%
votes_comparison inner_join(cong_votes_by_state, by = c("year", "state")) %>%
mutate(vote_difference = pres_votes - cong_votes)
# Display the table of vote differences
print(votes_comparison)
# A tibble: 592 × 5
year state pres_votes cong_votes vote_difference
<dbl> <chr> <dbl> <dbl> <dbl>
1 1976 ALABAMA 659170 667052 -7882
2 1976 ALASKA 44058 34194 9864
3 1976 ARIZONA 295602 355747 -60145
4 1976 ARKANSAS 498604 260998 237606
5 1976 CALIFORNIA 3742284 4144324 -402040
6 1976 COLORADO 460801 454741 6060
7 1976 CONNECTICUT 647895 681730 -33835
8 1976 DELAWARE 122461 102411 20050
9 1976 FLORIDA 1636000 1125786 510214
10 1976 GEORGIA 979409 929829 49580
# ℹ 582 more rows
# Create a simplified bar chart
ggplot(votes_comparison, aes(x = reorder(state, vote_difference), y = vote_difference, fill = vote_difference > 0)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("red", "green"), labels = c("Run Behind", "Run Ahead")) +
labs(title = "Vote Differences Between Presidential and Congressional Candidates (Democratic)",
x = "State",
y = "Vote Difference",
fill = "Vote Difference") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "top")
# Plot the vote differences
ggplot(votes_comparison, aes(x = year, y = vote_difference, color = state)) +
geom_line() +
labs(title = "Vote Differences Between Presidential and Congressional Candidates",
x = "Year",
y = "Vote Difference",
color = "State") +
theme_minimal()
To analyze whether the trend of presidential candidates running ahead or behind congressional candidates differs over time, across states, or across parties, we need to break down the data further. We will create visualizations and analyses that will help us understand whether the trend of presidential candidates running ahead or behind congressional candidates differs over time, across states, or across parties, and identify any presidents who were particularly more or less popular than their co-partisans.
# Summarize votes by state, year, and party
<- presidential_votes %>%
pres_votes_by_state_party group_by(year, state, party_simplified) %>%
summarize(pres_votes = sum(candidatevotes), .groups = 'drop')
<- house_votes %>%
cong_votes_by_state_party group_by(year, state, party) %>%
summarize(cong_votes = sum(candidatevotes), .groups = 'drop')
# Compare presidential and congressional votes by party
<- pres_votes_by_state_party %>%
votes_comparison_party inner_join(cong_votes_by_state_party, by = c("year", "state", "party_simplified" = "party")) %>%
mutate(vote_difference = pres_votes - cong_votes)
# Create a line chart to visualize the trend over time
ggplot(votes_comparison_party, aes(x = year, y = vote_difference, color = party_simplified, group = state)) +
geom_line() +
facet_wrap(~ party_simplified) +
labs(title = "Vote Differences Between Presidential and Congressional Candidates Over Time",
x = "Year",
y = "Vote Difference",
color = "Party") +
theme_minimal()
Above is looking at the trend overtime as line chart that shows the vote differences between presidential and congressional candidates have increased or decreased over time. If the vote differences are becoming more positive over time, it suggests that presidential candidates are increasingly running ahead of congressional candidates; this could be due to political shifts, changes in voter behavior, or other factors.
# Create a heatmap to visualize vote differences across states
ggplot(votes_comparison_party, aes(x = year, y = state, fill = vote_difference)) +
geom_tile() +
scale_fill_gradient2(low = "red", high = "green", mid = "white", midpoint = 0, name = "Vote Difference") +
labs(title = "Heatmap of Vote Differences Between Presidential and Congressional Candidates Across States",
x = "Year",
y = "State") +
theme_minimal()
The heatmap shows how vote differences vary across states. States with consistently high positive or negative vote differences indicate strong trends.
# Summarize vote differences by party
<- votes_comparison_party %>%
votes_comparison_summary group_by(party_simplified) %>%
summarize(avg_vote_difference = mean(vote_difference), .groups = 'drop')
# Create a bar chart to compare vote differences across parties
ggplot(votes_comparison_summary, aes(x = party_simplified, y = avg_vote_difference, fill = party_simplified)) +
geom_bar(stat = "identity") +
labs(title = "Average Vote Differences Between Presidential and Congressional Candidates by Party",
x = "Party",
y = "Average Vote Difference",
fill = "Party") +
theme_minimal()
This bar chart compares the vote differences for Democratic and Republican candidates. The Democratic candidates have lower average vote differences than Republican candidates, it suggests that Democratic presidential candidates tend to run behind of their congressional counterparts more often.
# Identify presidents who were more or less popular than their co-partisans
<- votes_comparison_party %>%
president_popularity group_by(year, party_simplified) %>%
summarize(avg_vote_difference = mean(vote_difference), .groups = 'drop') %>%
arrange(desc(avg_vote_difference))
# Display the table of president popularity
print(president_popularity)
# A tibble: 39 × 3
year party_simplified avg_vote_difference
<dbl> <chr> <dbl>
1 1984 REPUBLICAN 311941.
2 1988 REPUBLICAN 231984.
3 2008 REPUBLICAN 153982.
4 1976 REPUBLICAN 152969.
5 2012 DEMOCRAT 139929.
6 1980 REPUBLICAN 131011.
7 2004 REPUBLICAN 122565.
8 2004 DEMOCRAT 120429.
9 2016 DEMOCRAT 88731.
10 2000 DEMOCRAT 88021.
# ℹ 29 more rows
Pre-Task 4.
This table shows which presidents were more or less popular compared to their co-partisans.
Importing and Plotting Shape File Data
if(!file.exists("nyc_borough_boundaries.zip")){
download.file("https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=Shapefile",
destfile="nyc_borough_boundaries.zip")
}
##-
<- tempdir();
td <- unzip("nyc_borough_boundaries.zip",
zip_contents exdir = td)
<- zip_contents[grepl("shp$", zip_contents)]
fname_shp <- read_sf(fname_shp)
nyc_sf nyc_sf
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS: WGS84(DD)
# A tibble: 5 × 5
boro_code boro_name shape_area shape_leng geometry
<dbl> <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
1 3 Brooklyn 1934142776. 728147. (((-73.86327 40.58388, -73.863…
2 5 Staten Island 1623618684. 325910. (((-74.05051 40.56642, -74.050…
3 1 Manhattan 636646082. 360038. (((-74.01093 40.68449, -74.011…
4 2 Bronx 1187174772. 463181. (((-73.89681 40.79581, -73.896…
5 4 Queens 3041418004. 888197. (((-73.82645 40.59053, -73.826…
Task 4: Automate Zip File Extraction.
# Define the function read_shp_from_zip
<- function(zipfile) {
read_shp_from_zip # Create a temporary directory
<- tempdir()
td
# List the contents of the zip file to find the .shp file
<- unzip(zipfile, list = TRUE)
zip_contents
# Find the .shp file
<- zip_contents$Name[grepl("shp$", zip_contents$Name)]
fname_shp
# Extract only the .shp file
unzip(zipfile, files = fname_shp, exdir = td, overwrite = TRUE)
# Construct the full path to the .shp file
<- file.path(td, fname_shp)
shp_path
# Read the .shp file into an sf object
<- read_sf(shp_path)
shapefile_sf return(shapefile_sf)
}
# Test the function with NYC borough boundaries shapefile
if(!file.exists("nyc_borough_boundaries.zip")){
download.file("https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=Shapefile",
destfile = "nyc_borough_boundaries.zip")
}
<- read_shp_from_zip("nyc_borough_boundaries.zip")
nyc_sf print(nyc_sf)
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS: WGS84(DD)
# A tibble: 5 × 5
boro_code boro_name shape_area shape_leng geometry
<dbl> <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
1 3 Brooklyn 1934142776. 728147. (((-73.86327 40.58388, -73.863…
2 5 Staten Island 1623618684. 325910. (((-74.05051 40.56642, -74.050…
3 1 Manhattan 636646082. 360038. (((-74.01093 40.68449, -74.011…
4 2 Bronx 1187174772. 463181. (((-73.89681 40.79581, -73.896…
5 4 Queens 3041418004. 888197. (((-73.82645 40.59053, -73.826…
# Create a map of NYC borough boundaries
ggplot(nyc_sf, aes(geometry = geometry)) +
geom_sf() +
labs(title = "NYC Borough Boundaries") +
theme_minimal()
ggplot(nyc_sf,
aes(geometry=geometry,
fill = shape_area)) +
geom_sf()
Task 5: Chloropleth Visualization of the 2000 Presidential Election Electoral College Results.
Let’s create a chloropleth map of the 2000 presidential election results, coloring each state by the party that won the most votes combining everything and adding the number of electoral college each party won by state.
# Step 1: Load and check the US states shapefile
if(!file.exists("us_states_shapefile.zip")) {
download.file("https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_state_20m.zip",
destfile = "us_states_shapefile.zip")
}
# Define the function to load the shapefile from the zip
<- function(zipfile) {
read_shp_from_zip <- tempdir()
td <- unzip(zipfile, list = TRUE)
zip_contents <- zip_contents$Name[grepl("shp$", zip_contents$Name)]
fname_shp unzip(zipfile, files = fname_shp, exdir = td, overwrite = TRUE)
<- file.path(td, fname_shp)
shp_path
# Check if the shapefile exists
if (!file.exists(shp_path)) {
stop("Shapefile not found in the extracted files.")
}
# Read the shapefile
<- tryCatch({
shapefile_sf st_read(shp_path)
error = function(e) {
}, message("Error reading shapefile: ", e)
return(NULL)
})
# Check if the shapefile was read correctly
if (is.null(shapefile_sf)) {
stop("Failed to read the shapefile. The source could be corrupt or not supported.")
}
return(shapefile_sf)
}
<- function(zipfile) {
read_shp_from_zip <- tempdir()
td unzip(zipfile, exdir = td, overwrite = TRUE)
<- st_read(file.path(td, "cb_2020_us_state_20m.shp"))
shapefile_sf return(shapefile_sf)
}
# Load the shapefile
<- read_shp_from_zip("us_states_shapefile.zip") us_states
Reading layer `cb_2020_us_state_20m' from data source
`/private/var/folders/1x/xh1js_qd3nbgdtgpbmsg38rw0000gn/T/RtmpRD7gO4/cb_2020_us_state_20m.shp'
using driver `ESRI Shapefile'
Simple feature collection with 52 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS: NAD83
<- st_transform(us_states, crs = 4326)
us_states
# Sample election data for the 2000 presidential election
<- data.frame(
election_data state = c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"),
votes_bush = c(941173, 167398, 781652, 472940, 4567429, 883748, 561094, 137288, 2912790, 1419720, 137845, 336937, 2146065, 1245836, 634373, 622332, 872492, 927871, 286616, 813797, 878502, 1953139, 1109659, 573616, 1180924, 240178, 433862, 301575, 273559, 1284175, 286417, 2403067, 1561684, 174852, 2350444, 744337, 713577, 2486708, 130555, 937974, 190700, 1068899, 3799639, 515096, 119775, 1437490, 1108864, 336475, 1237279, 147947),
votes_gore = c(692611, 79004, 685341, 422768, 5861203, 738227, 816015, 180068, 2912253, 1116230, 205286, 138637, 2397365, 901980, 638517, 399276, 638898, 792344, 319951, 1147804, 1607057, 2172796, 1168266, 404614, 1111138, 137126, 231780, 279978, 266348, 1545273, 286417, 4175320, 1257692, 95237, 2356209, 474276, 720342, 2486708, 249508, 566039, 118804, 981720, 2433741, 203053, 149022, 1217295, 1247658, 295497, 1242983, 60797)
)
# Electoral college votes for each state in the 2000 election
<- data.frame(
electoral_votes state = c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"),
electoral_votes = c(9, 3, 8, 6, 54, 8, 8, 3, 25, 13, 4, 4, 22, 12, 7, 6, 8, 9, 4, 10, 12, 18, 10, 7, 11, 11, 3, 5, 4, 15, 5, 33, 14, 3, 21, 8, 7, 23, 4, 8, 3, 11, 32, 5, 3, 13, 11, 5, 11, 3)
)
# Step 3: Process the presidential votes data for 2000 to get the winning party by state
<- election_data %>%
election_2000 mutate(winner = ifelse(votes_bush > votes_gore, "Republican", "Democrat"),
party_color = ifelse(winner == "Republican", "red", "blue"))
# Merge the electoral votes with the election data
<- election_2000 %>%
election_2000 left_join(electoral_votes, by = "state")
# Merge the election data with the shapefile data
<- us_states %>%
us_states_map left_join(election_2000, by = c("NAME" = "state"))
# Calculate centroids for each state to place the labels
<- us_states_map %>%
us_states_map mutate(centroid = st_centroid(geometry))
# Extract coordinates of the centroids
<- us_states_map %>%
us_states_map mutate(lon = st_coordinates(centroid)[, 1],
lat = st_coordinates(centroid)[, 2])
# Plotting - Choropleth of the 2000 presidential election results with electoral votes and state names
ggplot(data = us_states_map) +
geom_sf(aes(fill = party_color), color = "black") +
geom_text(aes(x = lon, y = lat, label = paste(NAME, "\n", electoral_votes)),
size = 2.5, color = "white", lineheight = 0.8) +
scale_fill_identity() +
labs(title = "2000 Presidential Election Electoral College Results",
subtitle = "(Bush vs. Gore)",
fill = "Winning Party") +
theme_void() +
theme(plot.title = element_text(hjust = 0.5, size = 20),
plot.subtitle = element_text(hjust = 0.5, size = 15),
legend.position = "bottom") +
coord_sf(crs = st_crs(4326), xlim = c(-125, -66), ylim = c(24, 50))
This map displays both the state names and electoral votes centered on each state.
Task 6: Advanced Chloropleth Visualization of Electoral College Results.
library(sf)
library(ggplot2)
library(dplyr)
library(gganimate)
library(usmap)
# Convert state names to lowercase for easier joining
$NAME <- tolower(us_states$NAME)
us_states
# Sample election data for multiple years
<- data.frame(
election_data year = rep(c(2000, 2004, 2008, 2012, 2016), each = 5),
state = rep(c("alabama", "alaska", "arizona", "arkansas", "california"), 5),
votes_bush = rnorm(25, mean = 500000, sd = 100000), # Placeholder for real data
votes_gore = rnorm(25, mean = 500000, sd = 100000) # Placeholder for real data
)
# Determine winning party and color coding
<- election_data %>%
election_data mutate(winner = ifelse(votes_bush > votes_gore, "Republican", "Democrat"),
party_color = ifelse(winner == "Republican", "red", "blue"))
# Replicate us_states for each year and merge with election data
<- bind_rows(
us_states_animated lapply(unique(election_data$year), function(y) {
%>%
us_states mutate(year = y)
})%>%
) left_join(election_data, by = c("NAME" = "state", "year"))
# Simplify geometries to speed up plotting
<- st_simplify(us_states_animated, dTolerance = 0.01)
us_states_animated
# Plot and animate
#p <- ggplot(us_states_animated) +
# geom_sf(aes(fill = party_color), color = "black") +
# scale_fill_identity() +
# labs(title = "US Presidential Election Results: {closest_state}") +
# theme_void() +
# transition_states(year, transition_length = 2, state_length = 1) +
# ease_aes('linear')
# Render and save the animation as a GIF
#animate(p, width = 2000, height = 1600, nframes = 100, fps = 10, renderer = gifski_renderer("us_election_animation.gif"))
#knitr::include_graphics("us_election_animation.gif")
Comparing the Effects of ECV Allocation Rules
To compare the effects of different Electoral College Vote (ECV) allocation rules, we need to use historical voting data for U.S. presidential elections. Define allocation strategies such that:
- The candidate with the most votes in a state wins all the state’s ECVs;
- The candidate with the most votes in each congressional district wins that district’s ECV; The candidate with the most votes statewide wins the state’s at-large ECVs;
- ECVs are distributed proportionally based on the percentage of the vote each candidate receives in the state;
- ECVs are distributed proportionally based on the percentage of the national vote each candidate receives;
- Compare the winning presidential candidate under each strategy with the actual historical winner.
library(dplyr)
# Function for State-Wide Winner-Take-All
<- function(data) {
state_wide_winner_take_all %>%
data group_by(state) %>%
summarize(winner = candidate[which.max(votes)], ecv = sum(ecv)) %>%
ungroup()
}
# Function for District-Wide Winner-Take-All + State-Wide "At Large" Votes
<- function(data, district_data) {
district_wide_winner_take_all # District-level results (assuming we have a district_data with `state`, `district`, `candidate`, `votes`, and `ecv`)
<- district_data %>%
district_results group_by(state, district) %>%
summarize(winner = candidate[which.max(votes)], ecv = sum(ecv)) %>%
ungroup()
# State-level results (for at-large votes)
<- data %>%
state_results group_by(state) %>%
summarize(winner = candidate[which.max(votes)], ecv = sum(ecv)) %>%
ungroup()
# Combine district and state-level results
bind_rows(district_results, state_results)
}
# Function for State-Wide Proportional Allocation
<- function(data) {
state_wide_proportional %>%
data group_by(state) %>%
mutate(ecv = round(ecv * (votes / sum(votes)))) %>% # Allocate ECVs proportionally within the state
ungroup()
}
# Function for National Proportional Allocation
<- function(data) {
national_proportional <- sum(data$votes)
total_votes %>%
data mutate(ecv = round(ecv * (votes / total_votes))) %>% # Allocate ECVs proportionally based on national vote
group_by(candidate) %>%
summarize(ecv = sum(ecv)) %>% # Sum the ECVs per candidate
ungroup()
}
# Example Historical Data
<- data.frame(
historical_data year = c(2000, 2000, 2000, 2000, 2000),
state = c("alabama", "alaska", "arizona", "arkansas", "california"),
candidate = c("Bush", "Bush", "Bush", "Bush", "Gore"),
votes = c(941173, 167398, 781652, 472940, 4567429),
ecv = c(9, 3, 8, 6, 54)
)
# Example District Data (Placeholder)
<- data.frame(
district_data year = c(2000, 2000, 2000, 2000, 2000),
state = c("alabama", "alaska", "arizona", "arkansas", "california"),
district = c(1, 1, 1, 1, 1), # Just placeholder data for a single district
candidate = c("Bush", "Bush", "Bush", "Bush", "Gore"),
votes = c(941173, 167398, 781652, 472940, 4567429),
ecv = c(9, 3, 8, 6, 54)
)
# Apply Allocation Strategies
<- state_wide_winner_take_all(historical_data)
state_wide_results <- district_wide_winner_take_all(historical_data, district_data)
district_wide_results <- state_wide_proportional(historical_data)
state_wide_proportional_results <- national_proportional(historical_data)
national_proportional_results
# Function to Compare Results
<- function(actual, state_wide, district_wide, state_proportional, national_proportional) {
compare_results list(
actual = actual,
state_wide = state_wide,
district_wide = district_wide,
state_proportional = state_proportional,
national_proportional = national_proportional
)
}
# Compare Results
<- compare_results(historical_data, state_wide_results, district_wide_results, state_wide_proportional_results, national_proportional_results)
results print(results)
$actual
year state candidate votes ecv
1 2000 alabama Bush 941173 9
2 2000 alaska Bush 167398 3
3 2000 arizona Bush 781652 8
4 2000 arkansas Bush 472940 6
5 2000 california Gore 4567429 54
$state_wide
# A tibble: 5 × 3
state winner ecv
<chr> <chr> <dbl>
1 alabama Bush 9
2 alaska Bush 3
3 arizona Bush 8
4 arkansas Bush 6
5 california Gore 54
$district_wide
# A tibble: 10 × 4
state district winner ecv
<chr> <dbl> <chr> <dbl>
1 alabama 1 Bush 9
2 alaska 1 Bush 3
3 arizona 1 Bush 8
4 arkansas 1 Bush 6
5 california 1 Gore 54
6 alabama NA Bush 9
7 alaska NA Bush 3
8 arizona NA Bush 8
9 arkansas NA Bush 6
10 california NA Gore 54
$state_proportional
# A tibble: 5 × 5
year state candidate votes ecv
<dbl> <chr> <chr> <dbl> <dbl>
1 2000 alabama Bush 941173 9
2 2000 alaska Bush 167398 3
3 2000 arizona Bush 781652 8
4 2000 arkansas Bush 472940 6
5 2000 california Gore 4567429 54
$national_proportional
# A tibble: 2 × 2
candidate ecv
<chr> <dbl>
1 Bush 2
2 Gore 36
The output is a list of the results for each allocation strategy compared to the actual historical election data, showing the total ECVs per candidate under each strategy.
State-Wide Winner-Take-All method often aligns with the actual historical results since it is the most commonly used method.
State-Wide Proportional tends to distribute ECVs more evenly, potentially reducing the impact of swing states.
National Proportional ensures that the national popular vote is more closely reflected in the ECV distribution, which could favor candidates with broader national appeal.
These findings illustrate how different ECV allocation strategies can influence the outcome of presidential elections. Each method has its own implications and can favor different candidates depending on the distribution of votes.
Fact Check: Evaluating the Fairness of Different ECV Allocation Schemes
Fairest Allocation Scheme
The National Proportional Allocation is considered the fairest because it reflects the national popular vote, ensuring each vote has equal weight.
Comparison of Allocation Schemes
- State-Wide Winner-Take-All: Often aligns with historical results. In 2000, it resulted in George W. Bush winning.
- District-Wide Winner-Take-All + State-Wide “At Large” Votes: Can lead to different outcomes, especially in states with mixed representation.
- State-Wide Proportional: Distributes ECVs more evenly, reducing the impact of swing states.
- National Proportional: Reflects the national popular vote. In 2000, Al Gore would have won.
Impact on the 2000 Election
- State-Wide Winner-Take-All: Bush wins with 271 ECVs.
- District-Wide Winner-Take-All + State-Wide “At Large” Votes: Likely favors Bush.
- State-Wide Proportional: Closer result, possibly favoring Gore.
- National Proportional: Gore wins, reflecting his popular vote victory.
Conclusion
The National Proportional Allocation scheme is the fairest, as it accurately reflects the popular vote. In the 2000 election, this method would have resulted in Al Gore winning, highlighting the significant impact different ECV allocation methods can have on election outcomes.