Tidy Tuesday - NFL salary (2018-04-09)

tidytuesday
analysis
Discussing and making some visualizations on NFL Salary
Author

Ethan Tam

Published

October 3, 2024

Note

Note that the date following the post title is from when the dataset was added to Tidy Tuesday.
You can find the dataset I used for this week here (2018-04-09).

Source: Bk Aguilar

Ideas for dataset

  • Avg. salary of various NFL players by positions
    • A line graph showing the change over 2011 to 2018.
  • Box plot of salary for each position over the years
  • Then perhaps after, figure out if I can learn to make the graphs dynamic by letting user compare
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)

nfl_salary <- read.csv("datasets/nfl_salary.csv")
head(nfl_salary)
  year Cornerback Defensive.Lineman Linebacker Offensive.Lineman Quarterback
1 2011   11265916          17818000   16420000          15960000    17228125
2 2011   11000000          16200000   15623000          12800000    16000000
3 2011   10000000          12476000   11825000          11767500    14400000
4 2011   10000000          11904706   10083333          10358200    14100000
5 2011   10000000          11762782   10020000          10000000    13510000
6 2011    9244117          11340000    8150000           9859166    13250000
  Running.Back  Safety Special.Teamer Tight.End Wide.Receiver
1     12955000 8871428        4300000   8734375      16250000
2     10873833 8787500        3725000   8591000      14175000
3      9479000 8282500        3556176   8290000      11424000
4      7700000 8000000        3500000   7723333      11415000
5      7500000 7804333        3250000   6974666      10800000
6      7033000 7652700        3225000   6133333       9993750

Visualization 1

Let’s try to find the avg. salary for a quarterback in 2011, then per year in a line graph:

#filter by year 2011, then by quarterback
nfl_salary_2011 <- nfl_salary |>  
  filter(year == 2011) |> 
  select(Quarterback)
nfl_salary_2011
    Quarterback
1      17228125
2      16000000
3      14400000
4      14100000
5      13510000
6      13250000
7      12950000
8      12574700
9      12465000
10     11320000
11      9500000
12      9000000
13      8879000
14      7750000
15      7075000
16      6410499
17      6123750
18      5725000
19      5610000
20      5387500
21      5050000
22      5000000
23      4900000
24      4600000
25      4004636
26      4000000
27      4000000
28      4000000
29      3466667
30      3250000
31      3250000
32      3250000
33      3200000
34      3050000
35      2750000
36      2750000
37      2740000
38      2616667
39      2500000
40      2500000
41      2288364
42      2273225
43      2270000
44      2182118
45      2111875
46      2000000
47      2000000
48      1847036
49      1800000
50      1250000
51      1200000
52      1200000
53      1150000
54       948036
55       935000
56       931691
57       887500
58       810000
59       771500
60       735000
61       727500
62       700000
63       650000
64       637750
65       631375
66       616043
67       613000
68       547750
69       535969
70       529166
71       525000
72       525000
73       525000
74       525000
75       525000
76       525000
77       521696
78       495412
79       480000
80       480000
81       478556
82       473128
83       465000
84       455000
85       450000
86       450000
87       427763
88       421250
89       417215
90       401327
91       391288
92       375333
93       375000
94       330000
95       330000
96       154412
97        44117
98           NA
99           NA
100          NA
avg_qb_salary_2011 <- nfl_salary_2011 |> 
  summarise(mean_qb_salary_2011 = mean(Quarterback, na.rm = TRUE)) #remove the NA entries otherwise mean is NA.

avg_qb_salary_2011
  mean_qb_salary_2011
1             3376113

Here I filtered the rows to only be from the year 2011 and then selected the quarterback column. It’s sort of implied that I took the 2011 row from filtering it earlier.

Then I used summarize to create the column that is the mean of qb salary in 2011! The salary is about $3.3 million for the average quarterback. Interesting information, but I think a box plot will be really good to emphasize the range and the variance in this dataset.

#create a boxplot, grouped by year.



nfl_salary |> group_by(year) |> 
  select(year, Quarterback)
# A tibble: 800 × 2
# Groups:   year [8]
    year Quarterback
   <int>       <int>
 1  2011    17228125
 2  2011    16000000
 3  2011    14400000
 4  2011    14100000
 5  2011    13510000
 6  2011    13250000
 7  2011    12950000
 8  2011    12574700
 9  2011    12465000
10  2011    11320000
# ℹ 790 more rows
ggplot(nfl_salary, aes(group = year, year, Quarterback)) +
  geom_boxplot() + 
  scale_y_continuous(labels = scales::comma) #scales as comma instead of scientific notation
Warning: Removed 55 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Some interesting stuff here… A LOT of outliers! Most QB’s make under the 10m range, but there are many who make well north of that!