The Code Forest

Choosing a Fantasy Football Kicker with Empirical Bayes Estimation

Aug 29, 2019 16 min read R, Empirical Bayes, Fantasy Football

We’ll use 50 years of NFL kicking data to inform the least – or most – important decision of your fantasy season: Drafting a kicker.

Tidy Time Series Forecasting

Jul 19, 2019 15 min read R, Tidyverse, Time Series, Forecasting

Take your time series forecasting game to the next level by working through two real world scenarios in the Tidyverse!

The State of Names in America

Jun 12, 2019 15 min read R, Hypothesis Testing, Forecasting, Webscraping, Names

In this post, we’ll leverage 110 years of historical data – and everything from time-series forecasting to hypothesis testing – to understand how one’s state of birth influences their name

Text Mining for the Perfect Beer

May 20, 2019 19 min read Text Analytics, Web Scraping, R, Python, Beer

In this post, we’ll analyze reviews and ratings from Beeradvocate.com to understand what drives satisfaction amongst beer drinkers worldwide. Prost!

Surviving the NFL

Dec 12, 2018 15 min read Survival Analysis, R, NFL

Survival Analysis is the go-to method for analyzing time-to-event data. In this post, we’ll go deep on some historical player data and then leverage machine learning to predict how long new draft picks will remain in the NFL.

College Rankings and Pay

Sep 11, 2018 21 min read College Rankings, Career, R, Python

College rankings are a standard input for most students when choosing a school. But to what extent does a college’s rank relate to how much a graduate makes 10 years into their career? We’ll answer this question by web scraping data from a variety of online sources with R and Python, and then build a model to understand which factors matter most to post-college pay.

The Optimal Portland Pub Crawl

Jun 1, 2018 13 min read R, Python, Reticulate, Traveling Salesman Problem, Route Optimization

Portland, Oregon is home to some of the best watering holes in America. With so many places to quaff a West Coast Style IPA or glass of Pinot Noir, choosing which to visit (and in which order) can be a daunting task. To address this question, we’ll leverage some classic optimization techniques to minimize the total distance travelled between the top bars in Portland for a truly “optimal” Pub Crawl.

Computer Vision with R & Keras

Jan 10, 2018 12 min read R, Keras, Computer Vision, Image Classification, Python

Keras is quickly becoming the go-to prototyping solution for computer vision problems, and this post provides an overview of how to rapidly build a Convolutional Neural Network in R with the Keras library.

Time Series Forecasting with Neural Networks

Jan 4, 2018 13 min read R, Neural Networks, Forecasting

Advanced machine learning algorithms like Artificial Neural Networks(ANNs) can’t model time-dependent data without some pre-processing. The additional processing hurdle often deters forecasters from implementing advanced methods in favor of classic (but less powerful) approaches. However, I’ve observed some notable accuracy gains applying ANNs to forecasting problems. Accordingly, this post provides a basic playbook for data cleaning, feature engineering, model selection, prediction, and risk assessment when forecasting with Neural Nets.

Choosing a Fantasy Football Quarterback

Sep 10, 2017 13 min read Fantasy Football, Python, R, Beta Distribution

Aaron Rodgers or Tom Brady? Carson Wentz or Drew Brees? Choosing the right Fantasy Football QB each week is challenging. To remove some of the guesswork from the decision-making process, I devised an approach that’s worked well over the past few seasons. Read on to learn more about using the Beta Distribution to pick your weekly starting QB.

Combine Analysis

Sep 4, 2017 17 min read Fantasy Football, Python, R

Drafting a rookie in Fantasy Football can be a risky move, but it can pay huge dividends if you happen to snag a diamond in the rough. After accounting for a player’s draft position, do physical attributes (height/weight) and combine performance (40 yard dash, bench press, etc.) provide any additional explanatory power of points scored during a player’s first NFL season? I’ll explore this question for rookie Running Backs and Wide Receivers.

Two Flavors of Parallel Simulation

Sep 1, 2017 10 min read Spark, R, Parallel Processing, Power

Tired of waiting around for your simulations to finish? Run them in parallel! This post covers how to use Spark and ForEach to add parallelism to your R code.

Forecasting with Tom Brady

Aug 16, 2017 17 min read Forecasting, R, Python, Web Scraping, ARIMA, Sports Betting

This post focuses on some of my favorite things – football and forecasting – and will outline how to leverage external regressors when creating forecasts. We’ll do some web scraping in R and Python to create our dataset, and then forecast how many people will visit Tom Brady’s Wikipedia page.

Feature Selection for the Wine Connoisseur

Aug 14, 2017 11 min read R, H2O, Feature Selection, Classification, Wine

Feature selection is an integral part of machine learning and this post explores what happens when lots of irrelevant features are added to the modeling process. We’ll also identify which algorithms are affected the most by such features. These questions will be addressed as we build a classifier and try to predict which wines we’ll like based on their chemical properties. So pour yourself a glass of Pinot Noir and fire up your R terminal!

Establishing Causality with Counterfactual Prediction

Aug 5, 2017 11 min read R, Counterfactual Prediction, Forecasting, Experimentation, Causal Impact

Sometimes a controlled experiment isn’t an option yet you want to establish causality. This post outlines a method for quantifying the effects of an intervention via counterfactual predictions.

Monte Carlo Power Calculations for Mixed Effects Models

Jul 28, 2017 10 min read Monte Carlo Simulation, Power, Mixed-Effect Modeling, R

That’s a dense title – Monte Carlo Simulation, Power, Mixed-Effect models. Each of these topics could be their own post. However, I’m going to discuss their interrelations in the context of experimental power and keep everything high-level. The goal is to get an intuitive idea of how we can leverage simulation to provide sample size estimates for experiments with nested data.

Time Series Outlier Detection

Jul 28, 2017 8 min read R, Time Series, Outlier Detection, Forecasting

This post covers a straightforward approach for detecting and replacing outliers in order to improve forecasting accuracy.

Exception Handling with Ron Burgundy

Jul 23, 2017 4 min read R, Python, Exception Handling

Exception handling is a critical component of any data science workflow. You write code. It breaks. You build logic to deal with the exceptions. Repeat. From my experience, one of point of confusion for new R users is how to handle exceptions, which is a bit more intuitive in Python. Accordingly, this post provides a practical overview of how to handle exceptions in R by first illustrating the concept in Python.

Early Trend Detection

Jun 24, 2017 21 min read R, Dynamic Time Warping, Market Matching, Forecasting

Early trend detection is a major area of focus in the analytics realm, because it can inform key business strategy yet it an remains extremely difficult task. This post outlines one trend-detection method in an effort to predict where a stock’s price will go in the future.

Is that Home Price Negotiable?

May 14, 2017 10 min read R, Quantile Regression, Prediction Interval

This post covers how quantile regression and prediction intervals can be used to determine how much ‘wiggle room’ there is for a home’s price.

Welcome to The Code Forest

Recent Posts

Choosing a Fantasy Football Kicker with Empirical Bayes Estimation

Tidy Time Series Forecasting

The State of Names in America

Text Mining for the Perfect Beer

Surviving the NFL

College Rankings and Pay

The Optimal Portland Pub Crawl

Computer Vision with R & Keras

Time Series Forecasting with Neural Networks

Choosing a Fantasy Football Quarterback

Combine Analysis

Two Flavors of Parallel Simulation

Forecasting with Tom Brady

Feature Selection for the Wine Connoisseur

Establishing Causality with Counterfactual Prediction

Monte Carlo Power Calculations for Mixed Effects Models

Time Series Outlier Detection

Exception Handling with Ron Burgundy

Early Trend Detection

Is that Home Price Negotiable?

Mark LeBoeuf

Data Scientist

Portland, OR

Biography

Interests

Education