The Optimal Portland Pub Crawl

Overview

The premise of a Pub Crawl is quite simple: visit several bars in an afternoon or evening without a clear plan of where you’ll go next. While this sort of spontaneous, unstructured approach may work for some people, I’ve always been a fan of having a plan – in this case, an optimal plan. If we want to maximize the number of places visited (and beers tasted) in a finite period of time, then there is simply no room for shoddy planning. Accordingly, this post provides a framework for designing the optimal Portland Pub Crawl by working through the following steps:

🍺 Web Scrape the top 100 Portland bars from here

🍺 Geocode each bar’s location

🍺 Find the optimal route between a subsample of the bars, because visiting 100 in a day would make the following day very bad

🍺 Determine a walking path between the bars

🍺 Create a map of the walking path, which can be use as a field guide to impress your friends once the pub crawl is under way

🍺 Promptly spill beer on the map at the 2nd bar, rendering it unreadable, followed by a game of darts and some popcorn

If that sounds like a plan, let’s get started!

Defining the Top 100 Bars

First, let’s identify the stops during our tour of Portland’s pub scene. In the section below, we’ll load up the R libraries, identify which version of Python we’d like to use, and then do some web scraping. Note that all of the python modules and R-scripts are contained in the same directory for simplicity.

libs = c('leaflet', 
         'reticulate',
         'tspmeta',
         'qdapRegex',
         'kableExtra',
         'knitr',
         'dplyr'
         )
lapply(libs, require, character.only = TRUE, quietly = TRUE)

# specify which version of Python to use
reticulate::use_python('//anaconda/bin/python', required = TRUE)

The function below will do the webscraping.

import urllib2
from bs4 import BeautifulSoup
import re
def find_best_bars():
    base_url = 'http://www.oregonlive.com/dining/index.ssf/2014/10/portlands_100_best_bars_bar_ta.html'
    page = urllib2.urlopen(base_url)
    soup = BeautifulSoup(page, 'html.parser')
    bar_descriptors = soup.find_all("div",class_ = 'entry-content')
    bar_descriptors = str(bar_descriptors).split("<p>")
    bar_descriptors
    bar_list = []
    for entry in bar_descriptors:
        try:
            bar_name = re.search('<strong>(.*)</strong>', entry)\
                                                          .group(1)
            bar_address = re.search('<em>(.*).', entry)\
                                                 .group(1)\
                                                 .split(",")[0]\
                                                 .split(";")[0]
            bar_list.append([bar_name, bar_address])
        except Exception as e:
            print(e)
    bar_list = [x for x in bar_list if x[1][0].isdigit()]
    bar_list = map(lambda x: [x[0], x[1] + " Portland, OR"], bar_list)
    best_bars_dict = {x[0] : x[1] for x in bar_list}
    return(best_bars_dict)

Historically, this operation would require executing a python script, writing the results out (in an .txt or .csv file), and then reading the result back into R. However, with the advent of reticulate, we can execute a python function and pull the output back without ever having to leave the cozy confines of R (or R-studio in this case). Recall that the actual python module find_best_bars.py is located in the same directory as our R-script. Below we’ll first “source” this function via source_python (which simply means to bring it into the R environment), then we’ll execute it. Note that we aren’t passing in any arguments; this function is designed for a very specific purpose, which is to find the best watering holes in Portland.

# brings our function into the R Environment
reticulate::source_python('find_best_bars.py')

# executes and stores the output  in our variable 'best_bars'
best_bars = find_best_bars()

Let’s examine the output.

print(head(best_bars))
## $`Sunshine Tavern`
## [1] "3111 S.E. Division St. Portland, OR"
## 
## $`The Fireside`
## [1] "801 N.W. 23rd Ave. Portland, OR"
## 
## $`Dan &amp; Louis Oyster Bar`
## [1] "208 S.W. Ankeny St. Portland, OR"
## 
## $`La Taq`
## [1] "1625 N.E. Killingsworth St. Portland, OR"
## 
## $`Oven &amp; Shaker`
## [1] "1134 N.W. Everett St. Portland, OR"
## 
## $Ataula
## [1] "1818 N.W. 23rd Place Portland, OR"

Looks good! The output from Python was a dictionary, which is automatically converted into R’s equivalent (a list). Now that we have our bars, the next step is to convert each of the addresses into a lat/long coordinate, a process known as “geocoding”.

Geocoding with Google Maps API

We’ll rely on the Google Maps API for this step. If you don’t have an API Key, you’ll have set up an account first, which allows for up to 2000 free calls per day. We are geocoding around 100 locations, so no worries about reaching the limit.

Once you have your API key, store it somewhere safe. One approach is to define it in your .bash_profile. This way it’s not in your codebase. I’ve created a little function below, collect_key_from_profile, that makes this transition possible.

collect_key_from_profile = function(profile_location, prefix){
  # The function assumes that keys in .bash_profile are formatted as follows:
  # export <key_name>="<actual_key>"
  # export GOOGLEMAPSKEY="KEY12345"
  # Extracts all text between "" 
  sys_command = paste("cat",
                      file.path(profile_location,
                                ".bash_profile")
  )
  profile_lines = system(sys_command, intern = TRUE)
  profile_lines = sapply(profile_lines, function(x) qdapRegex::ex_between(x,
                                                                          prefix,
                                                                          '"'
                                                                          )
                        )
  profile_key = unname(Filter(Negate(is.na), profile_lines))[[1]]
  return(profile_key)
}
bash_location = paste0(stringr::str_split(getwd(), '/')[[1]][1:3], 
                       collapse = "/")
bash_prefix = 'export GOOGLEMAPSKEY="'
gmaps_api_key = collect_key_from_profile(bash_location,
                                         bash_prefix
                                         )

Now that we have our API key and dictionary of best bars, we’ll pass these into the geocode_address function below.

import pandas as pd
import googlemaps
def geocode_address(address_dict, gmaps_key):
    gmaps = googlemaps.Client(key=gmaps_key)
    geocoded_addresses = []
    for name, address in address_dict.items():
        print("Geocoding Address {tmp_address}".format(tmp_address = address))
        try:
            geocode_result = gmaps.geocode(address)
            lat_long = geocode_result[0]['geometry']['location'].values()
            lat_long.extend([address, name])
            geocoded_addresses.append(lat_long)
        except Exception as e:
            print(e)
            geocoded_addresses.append(['NA', 'NA', address, name])
    geocode_address_df = pd.DataFrame(geocoded_addresses)
    geocode_address_df.columns = ['lat',
                                               'long',
                                               'address',
                                               'name'
                                               ]
    return(geocode_address_df)
# source our geocoding function
reticulate::source_python('geocode_address.py')
# pass in the best bars dictionary and our API key
bar_locations = geocode_address(best_bars,
                                gmaps_api_key)
lat long address name
45.52243 -122.6726 208 S.W. Ankeny St. Portland, OR Dan &amp; Louis Oyster Bar
45.52496 -122.6827 1134 N.W. Everett St. Portland, OR Oven &amp; Shaker
45.52890 -122.6983 836 N.W. 23rd Ave. Portland, OR Bamboo Sushi NW
45.54089 -122.6749 816 N. Russell St. Portland, OR Mint/820
45.51926 -122.5822 7907 S.E. Stark Ave. Portland, OR Vintage Cocktail Lounge
45.50525 -122.6541 1125 S.E. Division St. Portland, OR The Beer Mongers
45.51900 -122.6641 107 S.E. Washington St. Portland, OR Ambonnay
45.58896 -122.7514 8218 N. Lombard St. Portland, OR The Fixin’ To
45.51557 -122.6820 1239 S.W. Broadway Portland, OR Higgins
45.51893 -122.6579 720 SE Sandy Blvd. Portland, OR Rum Club

We have successfully geocoded all addresses. In the following section, we’ll solve a classic routing optimization problem: The Traveling Salesman Problem (TSP). The goal is to find the most direct route between all of the bars we decide to visit during the pub crawl.

Route Optimization

The goal of any routing optimization problem is simple: minimize the total distance travelled between different nodes (locations) in space while ensuring that each node is visited once. There are many algorithms to solve this type of problem, but we’ll leverage the 2-optimization or 2-opt method due to its simplicity. This algorithm finds the lowest cost route (i.e., the route with the shortest distance that ensures each node is visited once) by swapping the ‘edges’ (the path that connects two nodes) between different nodes. If a swap reduces the total length of our tour, then the swap is maintained; otherwise the swap is reversed and we try again with different edges. Note that the swap must ensure that a single route is always possible between all nodes. The algorithm stops when a tour is reached that cannot be improved with any more swaps (see here for a more in-depth explanation).

Before going any further, let’s plot out our locations to see what we’re working with. We’ll also define our starting point, which is often referred to as the ‘depot’.

depot_lat = 45.525915
depot_long = -122.684957

bar_map = leaflet(data = bar_locations) %>% 
          setView(lng = depot_long + 0.05, lat = depot_lat, zoom = 13) %>% 
          addProviderTiles("CartoDB.Positron") %>%
          addMarkers(lng=depot_long, lat=depot_lat) %>% 
          addCircleMarkers(lat=~lat, 
                           lng=~long,
                           color = "orange",
                           radius = 4,
                           weight = 10,
                           stroke = FALSE,
                           opacity = 4,
                           fillOpacity = 4
                           )

Each bar is represented as an orange dot, and the pointer indicates our starting position (the depot). Given that we are walking, let’s limit the potential distance to a maximum of three miles from our starting location. The function below calculates the total feet between two points defined by a latitude/longitude coordinate.

earth_dist = function (lat1, long1, lat2, long2)
{
  rad = pi/180
  a1 = lat1 * rad
  a2 = long1 * rad
  b1 = lat2 * rad
  b2 = long2 * rad
  dlon = b2 - a2
  dlat = b1 - a1
  a = (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
  c = 2 * atan2(sqrt(a), sqrt(1 - a))
  R = 6378.145
  d = R * c
  return(d* 3280.8)
}

Below we’ll filter to all locations based on the maximum distance we’re willing to travel.

feet_in_mile = 5280
# maximum distance is 3 miles
max_miles_away = 3

bar_locations_nearby = bar_locations %>% 
                       dplyr::mutate(distance_from_depot = earth_dist(depot_lat,
                                                               depot_long,
                                                               lat,
                                                               long
                                                               )
                              ) %>% 
                       dplyr::filter(distance_from_depot <= feet_in_mile * max_miles_away)

After applying our distance filter, there is still ~70 bars remaining. That’s going to be a challenge to complete in a reasonable amount of time, so we’ll further reduce our consideration set down to 24, which means we can spend a little more time at each bar. Below we’ll randomly select 24 bars from the 70.

set.seed(1)

# we'll visit 24 bars
n_bars = 24

# randomly select 24 bars to visit
bar_locations_nearby = bar_locations_nearby %>% 
                       dplyr::sample_n(n_bars)

Next we’ll transform the lat/long locations into a distance matrix. The distance matrix specifies the euclidean distance of each bar from every other bar.

# now find optimal route
coordinates = bar_locations_nearby %>% 
              dplyr::select(lat:name) %>% 
              dplyr::mutate(location_index = 2:(n() + 1)) %>% 
              dplyr::bind_rows(data.frame(lat = depot_lat,
                                          long = depot_long,
                                          address = 'depot',
                                          name = 'depot',
                                          location_index = 1
                                          )
                               ) %>% 
              dplyr::arrange(location_index)

coords_matrix = coordinates %>% 
                dplyr::select(lat, long) %>% 
                as.matrix()

dist_matrix = dist(coords_matrix)

The two functions below tsp_instance and run_solver will do the heavy lifting and find the optimal route between bars.

# create tsp instance
tsp_ins = tspmeta::tsp_instance(coords_matrix,
                                dist_matrix)

# find optimal route based on 2-opt method
opt_tour = as.integer(tspmeta::run_solver(tsp_ins, 
                                          method="2-opt")
                      )

# sort to start at depot
sorted_tour = c(opt_tour[which(opt_tour == 1):length(opt_tour)],
                opt_tour[1:(which(opt_tour == 1) - 1)]
                )

# join route order back to original data
coordinates = coordinates %>% 
              dplyr::inner_join(data.frame(location_index = sorted_tour,
                                           route_order = 1:length(sorted_tour)
                                           )
                                ) %>% 
              dplyr::arrange(route_order)

# reformat so each row has a starting lat/long and ending lat/long
route_df = coordinates %>% 
            dplyr::rename(start_lat = lat,
                          start_long = long
                          ) %>% 
            dplyr::mutate(end_lat = c(start_lat[2:n()], NA),
                          end_long = c(start_long[2:n()], NA)
                          ) %>% 
            na.omit()

Let’s take a peak at our data to see how everything turned out.

start_lat start_long address name location_index route_order end_lat end_long
45.52591 -122.6850 depot depot 1 1 45.52201 -122.6816
45.52201 -122.6816 1014 S.W. Stark St. Portland, OR Clyde Common 12 2 45.52254 -122.6781
45.52254 -122.6781 213 S.W. Broadway Portland, OR Bailey’s Taproom and The Upper Lip 14 3 45.52218 -122.6772
45.52218 -122.6772 219 S.W. 6th Ave. Portland, OR Little Bird 15 4 45.52540 -122.6782
45.52540 -122.6782 733 N.W. Everett St. Portland, OR Remedy Wine Bar 7 5 45.54089 -122.6749
45.54089 -122.6749 816 N. Russell St. Portland, OR Mint/820 11 6 45.55254 -122.6807
45.55254 -122.6807 4024 N. Interstate Ave. Portland, OR Alibi Tiki Lounge 5 7 45.56255 -122.6868
45.56255 -122.6868 1914 N. Killingsworth St. Portland, OR Hop &amp; Vine 10 8 45.54936 -122.6611
45.54936 -122.6611 412 N.E. Beech St. Portland, OR Beech Street Parlor 2 9 45.54837 -122.6524
45.54837 -122.6524 1325 N.E. Fremont St. Portland, OR Free House 16 10 45.53943 -122.6619

Sweet! Almost there. The final step is to convert these points into an actual travel path.

Creating a Walking Path

Currently, the path between different nodes (i.e., bars) are straight lines. We’ll be walking this tour, so a sidewalk travel path is required. We’ll call on the Google Maps API one last time to convert each of the straight-line edges to actual walking paths via the convert_route_to_path.py module. This module consists of two functions: find_path and extract_polyline. find_path takes a starting lat/long, ending lat/long, and method of travel (walking in our case) and returns step-by-step lat/long coordinates along with distance and time estimates. extract_polyline is a helper function that will format each of the step-by-step coordinates into pandas DataFrame. The output will then be returned as an R DataFrame. We’ll specify the python module below.

import googlemaps
import pandas as pd
import pytz
import polyline
def extract_polyline(travel_dict):
    gmaps_polyline = travel_dict['overview_polyline']['points']
    polyline_df = pd.DataFrame(polyline.decode(gmaps_polyline))
    polyline_df.columns = ['lat', 'long']
    polyline_df['path_order'] = range(1, polyline_df.shape[0] + 1)
    return(polyline_df)
def find_path(route_df, gmaps_key, travel_mode):
    gmaps = googlemaps.Client(key=gmaps_key)
    route_df = pd.DataFrame(route_df)   
    row_count = 1 
    out_route_df = pd.DataFrame()
    for index, row in route_df.iterrows():
        print("Processing Stop {n} of {n_rows}".format(n = str(index),
                                                                             n_rows = str(route_df.shape[0])
                                                                             )
                 )  
        row_count += 1
        try:
            travel_data = gmaps.directions(origin = list(row[['start_lat', 
                                                          'start_long'
                                                        ]]
                                                     ),
                                     destination = list(row[['end_lat', 
                                                           'end_long'
                                                         ]]
                                                     ),
                                      mode = travel_mode
                                      )
            path_df = extract_polyline(travel_data[0])
            path_df['location_index'] = row['location_index']
            path_df['travel_time'] = travel_data[0]['legs'][0]['duration']['value']
            path_df['miles'] = travel_data[0]['legs'][0]['distance']['text']
            path_df['route_order'] = row['route_order']
            out_route_df = out_route_df.append(path_df)
        except Exception as e:
            print(e)
            out_route_df = out_route_df.append(['NA'] * 7)
    out_route_df = out_route_df.reset_index(drop = True)
    return(out_route_df)

Next, we’ll read the convert_route_to_path.py module into R and pass in our route DataFrame, the Google Maps API key, and our preferred method of travel.

reticulate::source_python('convert_route_to_path.py')
travel_mode = 'walking'

path_df = calc_route(route_df,
                     gmaps_api_key,
                     travel_mode
                    )

The data indicating the path between our depot and the first bar should look like this:

lat long path_order location_index travel_time miles route_order
45.52576 -122.6849 1 1 82 371 ft 1
45.52579 -122.6843 2 1 82 371 ft 1
45.52616 -122.6843 3 1 82 371 ft 1
45.52616 -122.6845 4 1 82 371 ft 1
45.52619 -122.6845 5 1 82 371 ft 1
45.52624 -122.6845 6 1 82 371 ft 1

Note the small changes between each of the successive lat/long coordinates. This is the path we’ll be walking to obtain our first frosty mug of beer. Before mapping our data, let’s get a general idea of total walking time and distance.

travel_time_in_hours = round(path_df %>% 
                             dplyr::select(location_index, travel_time) %>% 
                             dplyr::distinct() %>% 
                             dplyr::pull(travel_time) %>% 
                             sum() / 3600, 1)

print(glue::glue("Total Travel Time Is: ",
                 travel_time_in_hours,
                 " Hours"
                 )
      )
## Total Travel Time Is: 5.2 Hours

It looks like this walk will take a little over five hours, so we’ll need to bring some comfy shoes. What about distance (we’ll need some way to work off those calories)?

travel_distance_in_miles = round(path_df %>% 
  dplyr::mutate(feet_numeric = 
                case_when(stringr::str_detect(miles, 'ft') == TRUE ~ 
                          as.numeric(stringr::str_replace(miles, 
                                                          " ft", 
                                                          ""
                                                          )
                                     ),
                          stringr::str_detect(miles, " mi") == TRUE ~ 
                          as.numeric(stringr::str_replace(miles, 
                                                          " mi", 
                                                          "")
                                     ) * feet_in_mile
                         )
                ) %>% 
  dplyr::select(location_index, feet_numeric) %>% 
  dplyr::distinct() %>% 
  dplyr::pull(feet_numeric) %>% 
  sum() / feet_in_mile, 1)

print(glue::glue("Total Travel Distance Is: ",
                 travel_distance_in_miles,
                 " Miles"
                 )
      )
## Total Travel Distance Is: 15.3 Miles

OK, this is more of a Pub Crawl half-marathon. That’s some serious distance to cover. Let’s bring it all together with some visualization.

Mapping the Route

The last step is to bring this analysis to life with everyone’s favorite visualization: MAPS! Indeed, we’ll plot the walking path across downtown Portland so we can actually see the Pub Crawl route.

# We'll use this to identify the labels for each stop 
label_df = path_df %>% 
           dplyr::filter(path_order == 1)

# Bar crawl visualization
final_route = leaflet(data = path_df) %>%
  setView(lng = depot_long + 0.02, lat = depot_lat, zoom = 13) %>% 
  addProviderTiles("CartoDB.Positron") %>%
  addPolylines(data = path_df %>% 
                      filter(route_order < 24),
               lng = ~long,
               lat = ~lat,
               color = "orange",
               opacity = 4
               ) %>% 
  addMarkers(lng = depot_long,
             lat = depot_lat
             ) %>% 
  addCircleMarkers(data = label_df,
                   lng = ~long,
                   lat = ~lat,
                   radius = 4,
                   label = ~as.character(route_order),
                   labelOptions = labelOptions(noHide = T,
                                               textOnly = T,
                                               direction = 'top',
                                               textsize = "14px",
                                               offset=c(0,-30),
                                               size = 1
                                               )
                  )

Whew! That was a lot of work but it looks like we have a reasonable solution. We’ll start in downtown Portland, take a quick jaunt over to the Eastside of the city and eventually return back to the Northwest. So, there you have it – a real-world application of optimization that supports an efficient Pub Crawl. Prost!

Related

comments powered by Disqus