Tom Bliss, Connor Daly, Jacob Klein, Patrick Lewis December 2018
Data for this analysis was collected from the 2013-2014 NFL season through Week 9 of the 2018-2019 NFL season. It should be noted that this analysis was conducted during the 2018-2019 NFL season. Thus we have included all data available for the current season, but this season was excluded in any analyses in which penalties were compared over entire seasons, since all data for the entire season will not be available until February 2019.
We downloaded play-by-play data from NFL Savant. From this site, we downloaded CSV files from seasons beginning in 2013 through 2018. These six CSV files begin with "pbp" and are located within the "/data" folder of this repository. We then downloaded referee game data from Pro-Football-Reference.
Due to the fact that there are hundreds of referees, linesmen, side judges, and line judges employed by the NFL, for the purposes of this analysis we grouped officials by head referee. We also excluded any referee crews that have not been active in the NFL for a sufficient amount of time (and thus do not yield a sufficient number of data points); all referee crews in our dataset have reffed over 50 games between the 2013 NFL Season through week 9 of the 2018 NFL Season, inclusive.
A CSV file is available in the "/data" subfolder for each of the 17 referees we included in the analysis. Also included in this subfolder is a file called "abbreviations.csv" which was used when joining the play-by-play data to the referee data, and includes a mapping of team names to team abbreviations. For example, this file maps team name "Arizona Cardinals" to team abbreviation "ARI".
In additon to the data there is a file entitled preprocessing.R which iterates through the CSV Files and merges them into a single data set. Moreover, since many of the penalty types in our dataset are quite similar in nature (e.g. Offensive Offside
and Defensive Offside
), we grouped such instances using universal grouping terms, as shown in the below table. Moreover, certain penalty type groupings accounted for less than 1% of all observations even after the groupings were applied, and such groups were then grouped again into the Other
category. The table below highlights all groupings we used for our analysis.
Grouping Term | Penalty Types Included |
---|---|
Delay of Game | Defensive Delay of Game, Delay of Game, Delay of Kickoff |
Illegal Block | Chop Block, Clipping, Illegal Blindside Block, Illegal Block Above the Waste, Illegal Crackback, Illegal Peelback, Illegal Wedge, Low Block, Offensive Holding |
Illegal Formation | Illegal Formation, Illegal Motion, Illegal Shift |
Illegal Tackle | Face Mask (15 Yards), Horse Collar Tackle, Lowering the Head to Initiate Contact |
Illegal Use of Hands | Illegal Use of Hands |
Offside | Defensive Offside, Encroachment, False Start, Neutral Zone Infraction, Offensive Offside, Offside on Free Kick |
Pass Interference | Defensive Holding, Defensive Pass Interference, Illegal Contact, Offensive Pass Interference |
Roughing a Protected Player | Roughing the Kicker, Roughing the Passer, Running Into the Kicker |
Too Many Men on the Field | Defensive 12 On-Field, Defensive Too Many Men on the Field, Illegal Substitution, Offensive 12 On-Field, Offensive Too Many Men on Field |
Unsportsmanlike Conduct | Disqualification, Personal Foul, Taunting, Unnecessary Roughness, Unsportsmanlike Conduct |
Other | Fair Catch Interference: Fair Catch Interference, Interference with Opportunity to Catch, Kick Catch Interference Illegal Action to Block a Field Goal: Leaping, Leverage Illegal Bat: Illegal Bat Illegal Forward Pass: Illegal Forward Pass Illegal Kickoff: Kickoff Out of Bounds, Short Free Kick Illegal Player Out of Bounds: Illegal Touch Kick, Illegal Touch Pass, Player Out of Bounds on Kick, Player Out of Bounds on Punt Ineligible Player Downfield: Ineligible Downfield Kick, Ineligible Downfield Pass Intentional Grounding: Intentional Grounding Invalid Fair Catch Signal: Invalid Fair Catch Signal Tripping: Tripping |
Additionally, due to some teams moving cities as well as general inconsistency of abbreviations across the data, the following abbreviations were updated to ensure consistency in the dataset:
Team Name | Old Abbreviation | New Abbreviation |
---|---|---|
Chargers | SD | LAC |
Jaguars | JAC | JAX |
Rams | LA | LAR |
Rams | STL | LAR |
Finally, the FinalReport.rmd creates the html file of the Final Report located in the "/docs" folder. Also included in this folder are two files which contain code for our interactive component of the report entitled "index.html" and "index_report.html". The file "index.html" is a stand-alone version of the interactive component while "index_report.html" is designed to work within the "FinalReport.html" file.