Foursquare Data and Explore Final
Foursquare Data and Explore Final
Overview
Data intro Explore recommendation engine Technology stack More infographics
Check-ins
8 PM
> 10 million > 15 million > 750 million > 3 million / day
Check-ins
8 PM
Our data
Human mobility patterns Explicit geo-spatial/temporal data Many ways to interpret, lter, slice, visualize
Growth
Week of Jan 21, 2011 Firehose of ~10.9 million checkins Examined NYC vs SF Bay area
WSJ SF Heatmap
WSJ factoids
NYC vs SF Bay Area Male vs Female
Women check into sushi, libraries, karaoke more Men check into burgers, tech startups, gay bars more 50/50 split on mexican, ramen, beaches, desserts
http://graphicsweb.wsj.com/documents/FOURSQUAREWEEK1104/
Explore
Social recommendation
engine
Coffee?
Realtime recommendations Time of day Previous checkins Friend history User history
7/28/2011 Check-in and Wafes
Friday, July 29, 2011
Time of day
Relevant for the time of day, day of week Afnities for different categories at
different times useful
Tartine Bakery
DNA Lounge
Blue Bottle
Previous checkins
We are not unique
snowakes
repeat checkin %
at to a place that their social circle has been to before > 60%
Friend history
Social justication Call out similar
friends
User history
Places Ive been
User history
Highlight places similar
to places Ive been
Venue similarity
People that go to Blue Bottle, also go to:
Venue similarity
People that go to Tartine, also go to:
Computing similarity
NYC Food sample
Venues Users
Incredibly Sparse Matrix
Computing similarity
Venue similarity
Venues Users
Incredibly Sparse Matrix
Computing similarity
User similarity
for all i,j sim(ui, uj) ui Incredibly Sparse Matrix uj Venues
Users
Similarity pipeline
Input: Mongo dumps on S3 or HDFS Java mapreduces via Hadoop / Elastic
Mapreduce
Mapreduce overview
key user visited venues
emit all pairs of visited venues for each user
map
score score
reduce
Similar Venues
< 200 ms
MOAR Signals
Explore performance
Track behavior over
time
Measure
improvements
Model updated
Run experiments
7/28/2011 Check-in and Wafes
Friday, July 29, 2011
Data Stack
MongoDB (production) Amazon S3, Elastic Mapreduce Hadoop Hive Flume R/RStudio
7/28/2011 Check-in and Wafes
Friday, July 29, 2011
Hive interface
Text
Rudest cities
Boca is Nice!
Happiness analysis
Aditya Mukerjees awesome work Happy Sad
stupid at&t and no reception SF fuck mondays NYC
*pocalypse
A tradition of hyperbole Snowpocalypse 2011 Heatpocalypse 2011
Marriage Equalitocalypse
Love data?