This notebook is available here: https://gist.github.com/linanqiu/87a0c708b3f7383fc56c2eb87ab8b087

Cheap Rides

My friend came up to me with a startup idea.

“Hey what if you could compare Uber and Lyft prices and get the cheapest one? What if you make that a service? …more pitching…”

Beyond the clear terms of agreement violation (Uber API literally has a clause saying that you aren’t allowed to do that) and the fact that Google Maps already has something like this, I was curious how easy / hard this was via the Uber / Lyft APIs. Turns out this can be done in like half an hour. Both had insanely clearly documented APIs and developer interfaces.

1
2
3
4
5
6
7
8
9
10
import requests
import json
import pint
import itertools
import pandas

import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

Get your CLIENT_TOKENs from Uber and Lyft APIs. It’s literally less than 5 clicks each.

Now let’s try to compare prices for a sample trip: Columbia University to Wow Karaoke on 34th St in Manhattan, a trip that I should probably have taken less when I was a student (and a perfect use case for this, given that I make most of these trips drunk and unable to navigate two apps). Use Google Maps to get the coordinates.

1
2
3
4
5
6
payload = {
'start_lat': 40.8075395,
'start_lng': -73.964761417,
'end_lat': 40.7473683,
'end_lng': -73.988627417
}

We grab both payloads from the APIs. They have really similar API endpoints – it’s almost like as if they’re poaching engineers from each other. Or just following really good API standards. I won’t know – I come from the finance industry where the last thing everyone agreed on was FIX and it’s absolute manure.

1
2
3
4
5
6
7
8
payload_lyft = dict(payload)

headers_lyft = {
'Authorization': 'Bearer %s' % CLIENT_TOKEN_LYFT,
'Content-type': 'application/json'
}

estimate_lyft = requests.get('https://api.lyft.com/v1/cost', params=payload_lyft, headers=headers_lyft)
1
2
3
4
5
6
7
8
payload_uber = dict(((k.replace('lat', 'latitude').replace('lng', 'longitude'), v) for k, v in payload.items()))

headers_uber = {
'Authorization': 'Bearer %s' % CLIENT_TOKEN_UBER,
'Content-type': 'application/json'
}

estimate_uber = requests.get('https://api.uber.com/v1.2/estimates/price', params=payload_uber, headers=headers_uber)
1
2
print(estimate_lyft.content)
print(estimate_uber.content)
b'{"cost_estimates": [{"currency": "USD", "ride_type": "lyft_line", "display_name": "Shared", "primetime_percentage": "0%", "primetime_confirmation_token": null, "cost_token": null, "price_quote_id": "ENq21tjTLA==", "is_valid_estimate": true, "estimated_duration_seconds": 1498, "estimated_distance_miles": 6.25, "estimated_cost_cents_min": 900, "estimated_cost_cents_max": 1200, "can_request_ride": true}, {"currency": "USD", "ride_type": "lyft", "display_name": "Lyft", "primetime_percentage": "0%", "primetime_confirmation_token": null, "cost_token": null, "price_quote_id": "ENq21tjTLA==", "is_valid_estimate": true, "estimated_duration_seconds": 1498, "estimated_distance_miles": 6.25, "estimated_cost_cents_min": 2000, "estimated_cost_cents_max": 2500, "can_request_ride": true}, {"currency": "USD", "ride_type": "lyft_plus", "display_name": "Lyft XL", "primetime_percentage": "0%", "primetime_confirmation_token": null, "cost_token": null, "price_quote_id": "ENq21tjTLA==", "is_valid_estimate": true, "estimated_duration_seconds": 1498, "estimated_distance_miles": 6.25, "estimated_cost_cents_min": 3000, "estimated_cost_cents_max": 3500, "can_request_ride": true}, {"currency": "USD", "ride_type": "lyft_lux", "display_name": "Lux Black", "primetime_percentage": "0%", "primetime_confirmation_token": null, "cost_token": null, "price_quote_id": "ENq21tjTLA==", "is_valid_estimate": true, "estimated_duration_seconds": 1498, "estimated_distance_miles": 6.25, "estimated_cost_cents_min": 4200, "estimated_cost_cents_max": 4900, "can_request_ride": true}, {"currency": "USD", "ride_type": "lyft_luxsuv", "display_name": "Lux Black XL", "primetime_percentage": "0%", "primetime_confirmation_token": null, "cost_token": null, "price_quote_id": "ENq21tjTLA==", "is_valid_estimate": true, "estimated_duration_seconds": 1498, "estimated_distance_miles": 6.25, "estimated_cost_cents_min": 6000, "estimated_cost_cents_max": 7000, "can_request_ride": true}]}\n'
b'{"prices":[{"localized_display_name":"UberPool","distance":5.62,"display_name":"UberPool","product_id":"929fcc19-8cb4-4007-a54f-3ab34473700f","high_estimate":16.0,"low_estimate":11.0,"duration":1380,"estimate":"$11-15","currency_code":"USD"},{"localized_display_name":"UberXL","distance":5.62,"display_name":"UberXL","product_id":"1e0ce2df-4a1e-4333-86dd-dc0c67aaabe1","high_estimate":36.0,"low_estimate":29.0,"duration":1380,"estimate":"$29-36","currency_code":"USD"},{"localized_display_name":"UberX","distance":5.62,"display_name":"UberX","product_id":"b8e5c464-5de2-4539-a35a-986d6e58f186","high_estimate":28.0,"low_estimate":22.0,"duration":1380,"estimate":"$22-28","currency_code":"USD"},{"localized_display_name":"Car Seat","distance":5.62,"display_name":"Car Seat","product_id":"d6d6d7ad-67f9-43ef-a8de-86bd6224613a","high_estimate":39.0,"low_estimate":31.0,"duration":1380,"estimate":"$31-39","currency_code":"USD"},{"localized_display_name":"Black","distance":5.62,"display_name":"Black","product_id":"0e9d8dd3-ffec-4c2b-9714-537e6174bb88","high_estimate":50.0,"low_estimate":40.0,"duration":1380,"estimate":"$40-50","currency_code":"USD"},{"localized_display_name":"Black SUV","distance":5.62,"display_name":"Black SUV","product_id":"56487469-0d3d-4f19-b662-234b7576a562","high_estimate":66.0,"low_estimate":53.0,"duration":1380,"estimate":"$53-66","currency_code":"USD"}]}'

Now we can normalize these JSON blobs into a canonical form I care about: the id for a quote, the app of origin of the quote, the ride_type (e.g. Lyft Line vs Uber XL), distance, cost_min, cost_max and the currency of the quote cost_currency. These information exist in both return contents, but in different formats (be it time, currency, or distance). I use the pint package to represent units instead of doing stupid integer math on my own. Strong typing ftw. Even in Python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
ureg = pint.UnitRegistry()

estimate_lyft_obj = estimate_lyft.json()
estimate_lyft_norm = ({
'id': est['price_quote_id'],
'app': 'lyft',
'ride_type': est['ride_type'],
'distance': est['estimated_distance_miles'] * ureg.mile,
'duration': est['estimated_duration_seconds'] * ureg.second,
'cost_min': est['estimated_cost_cents_min'] / 100,
'cost_max': est['estimated_cost_cents_max'] / 100,
'cost_currency': est['currency']
} for est in estimate_lyft_obj['cost_estimates'])

estimate_uber_obj = estimate_uber.json()
estimate_uber_norm = ({
'id': est['product_id'],
'app': 'uber',
'ride_type': est['display_name'],
'distance': est['distance'] * ureg.mile,
'duration': est['duration'] * ureg.second,
'cost_min': est['low_estimate'],
'cost_max': est['high_estimate'],
'cost_currency': est['currency_code']
} for est in estimate_uber_obj['prices'])

estimate_norm = list(itertools.chain(estimate_lyft_norm, estimate_uber_norm))

Now let’s see what lovely data we have. Turns out we have tons of that.

1
2
estimate_df = pandas.DataFrame(estimate_norm)
estimate_df
app cost_currency cost_max cost_min distance duration id ride_type
lyft USD 12.0 9.0 6.25 mile 1498 second ENq21tjTLA== lyft_line
lyft USD 25.0 20.0 6.25 mile 1498 second ENq21tjTLA== lyft
lyft USD 35.0 30.0 6.25 mile 1498 second ENq21tjTLA== lyft_plus
lyft USD 49.0 42.0 6.25 mile 1498 second ENq21tjTLA== lyft_lux
lyft USD 70.0 60.0 6.25 mile 1498 second ENq21tjTLA== lyft_luxsuv
uber USD 16.0 11.0 5.62 mile 1380 second 929fcc19-8cb4-4007-a54f-3ab34473700f UberPool
uber USD 36.0 29.0 5.62 mile 1380 second 1e0ce2df-4a1e-4333-86dd-dc0c67aaabe1 UberXL
uber USD 28.0 22.0 5.62 mile 1380 second b8e5c464-5de2-4539-a35a-986d6e58f186 UberX
uber USD 39.0 31.0 5.62 mile 1380 second d6d6d7ad-67f9-43ef-a8de-86bd6224613a Car Seat
uber USD 50.0 40.0 5.62 mile 1380 second 0e9d8dd3-ffec-4c2b-9714-537e6174bb88 Black
uber USD 66.0 53.0 5.62 mile 1380 second 56487469-0d3d-4f19-b662-234b7576a562 Black SUV

We get pretty interesting observations: right now, Uber’s more expensive than Lyft by two whole bucks. That’s interesting! No? Yes? Whatever.

1
2
3
4
5
6
7
8
9
10
fig, ax = plt.subplots()
# sorry for ugly indents. fucking python.
estimate_df[
['app', 'ride_type', 'cost_min', 'cost_max']
].set_index(
['app', 'ride_type']
).sort_values(
['app', 'cost_min', 'cost_max']
).plot(kind='bar', ax=ax)
plt.show()

I showed this to my roommate and he’s still convinced this is a great startup idea.