Want to try predicting business categories with a fancy clustering algorithm? How about predicting star ratings using sentiment analysis? Or maybe you want to build a cool visualization of great local businesses?

Yelp is providing all the data and reviews of the 250 closest businesses for 30 universities for students and academics to explore and research. We've provided some examples on GitHub to get you started. To get them running, you will need to install MRJob, our python framework for Map-Reduce computing.

Check out our GitHub page, for more fun/useful projects!

Access

You'll need to have an active Yelp account, access to the Yelp API, and agree to the dataset access agreement to access the dataset. Once you've completed all those steps, you can download the dataset from this page.

To request access, you'll need a valid Yelp API token, called a YWSID.

Create a Yelp API Account

Usage of this dataset is governed by the Academic Dataset Terms of Use

Usage

The dataset is a single gzip-compressed file, composed of one json-object per line. Every object contains a 'type' field, which tells you whether it is a business, a user, or a review.

Business Objects

Business objects contain basic information about local businesses. The 'business_id' field can be used with the Yelp API to fetch even more information for visualizations, but note that you'll still need to comply with the API TOS. The fields are as follows:

{
  'type': 'business',
  'business_id': (a unique identifier for this business),
  'name': (the full business name),
  'neighborhoods': (a list of neighborhood names, might be empty),
  'full_address': (localized address),
  'city': (city),
  'state': (state),
  'latitude': (latitude),
  'longitude': (longitude),
  'stars': (star rating, rounded to half-stars),
  'review_count': (review count),
  'photo_url': (photo url),
  'categories': [(localized category names)]
  'open': (is the business still open for business?),
  'schools': (nearby universities),
  'url': (yelp url)
}
        

Review Objects

Review objects contain the review text, the star rating, and information on votes Yelp users have cast on the review. Use user_id to associate this review with others by the same user. Use business_id to associate this review with others of the same business.

{
  'type': 'review',
  'business_id': (the identifier of the reviewed business),
  'user_id': (the identifier of the authoring user),
  'stars': (star rating, integer 1-5),
  'text': (review text),
  'date': (date, formatted like '2011-04-19'),
  'votes': {
    'useful': (count of useful votes),
    'funny': (count of funny votes),
    'cool': (count of cool votes)
  }
}
        

User Objects

User objects contain aggregate information about a single user across all of Yelp (including businesses and reviews not in this dataset).

{
  'type': 'user',
  'user_id': (unique user identifier),
  'name': (first name, last initial, like 'Matt J.'),
  'review_count': (review count),
  'average_stars': (floating point average, like 4.31),
  'votes': {
    'useful': (count of useful votes across all reviews),
    'funny': (count of funny votes across all reviews),
    'cool': (count of cool votes across all reviews)
  }
}
        

Dataset Challenge

Not only would we like to give you our data, we'd also like to announce a new Yelp Dataset Challenge.

Want More?

We're hiring! Check out available jobs at our jobs page.

Schools

Yelp's dataset includes information for businesses near these 30 schools:

  • Brown University
  • California Institute of Technology
  • California Polytechnic State University
  • Carnegie Mellon University
  • Columbia University
  • Cornell University
  • Georgia Institute of Technology
  • Harvard University
  • Harvey Mudd College
  • Massachusetts Institute of Technology
  • Princeton University
  • Purdue University
  • Rensselaer Polytechnic Institute
  • Rice University
  • Stanford University
  • University of California - Los Angeles
  • University of California - San Diego
  • University of California at Berkeley
  • University of Illinois - Urbana-Champaign
  • University of Maryland - College Park
  • University of Massachusetts - Amherst
  • University of Michigan - Ann Arbor
  • University of North Carolina - Chapel Hill
  • University of Pennsylvania
  • University of Southern California
  • University of Texas - Austin
  • University of Washington
  • University of Waterloo
  • University of Wisconsin - Madison
  • Virginia Tech

Questions?

Don't see your school on the list? Any other feedback? Send us an email at dataset@yelp.com

.