R

Combine Analysis

Drafting a rookie in Fantasy Football can be a risky move, but it can pay huge dividends if you happen to snag a diamond in the rough. After accounting for a player’s draft position, do physical attributes (height/weight) and combine performance (40 yard dash, bench press, etc.) provide any additional explanatory power of points scored during a player’s first NFL season? I’ll explore this question for rookie Running Backs and Wide Receivers.

Two Flavors of Parallel Simulation

Tired of waiting around for your simulations to finish? Run them in parallel! This post covers how to use Spark and ForEach to add parallelism to your R code.

Forecasting with Tom Brady

This post focuses on some of my favorite things – football and forecasting – and will outline how to leverage external regressors when creating forecasts. We’ll do some web scraping in R and Python to create our dataset, and then forecast how many people will visit Tom Brady’s Wikipedia page.

Feature Selection for the Wine Connoisseur

Feature selection is an integral part of machine learning and this post explores what happens when lots of irrelevant features are added to the modeling process. We’ll also identify which algorithms are affected the most by such features. These questions will be addressed as we build a classifier and try to predict which wines we’ll like based on their chemical properties. So pour yourself a glass of Pinot Noir and fire up your R terminal!

Establishing Causality with Counterfactual Prediction

Sometimes a controlled experiment isn’t an option yet you want to establish causality. This post outlines a method for quantifying the effects of an intervention via counterfactual predictions.

Monte Carlo Power Calculations for Mixed Effects Models

That’s a dense title – Monte Carlo Simulation, Power, Mixed-Effect models. Each of these topics could be their own post. However, I’m going to discuss their interrelations in the context of experimental power and keep everything high-level. The goal is to get an intuitive idea of how we can leverage simulation to provide sample size estimates for experiments with nested data.

Time Series Outlier Detection

This post covers a straightforward approach for detecting and replacing outliers in order to improve forecasting accuracy.

Exception Handling with Ron Burgundy

Exception handling is a critical component of any data science workflow. You write code. It breaks. You build logic to deal with the exceptions. Repeat. From my experience, one of point of confusion for new R users is how to handle exceptions, which is a bit more intuitive in Python. Accordingly, this post provides a practical overview of how to handle exceptions in R by first illustrating the concept in Python.

Early Trend Detection

Early trend detection is a major area of focus in the analytics realm, because it can inform key business strategy yet it an remains extremely difficult task. This post outlines one trend-detection method in an effort to predict where a stock’s price will go in the future.

Is that Home Price Negotiable?

This post covers how quantile regression and prediction intervals can be used to determine how much ‘wiggle room’ there is for a home’s price.