#this imports the api to scrap
# import json package
import json
#opens and finds the api to load it
with open('/Users/rachnarawalpally/project-rachnarawalpally/technical-details/data-collection/api-key.json') as f:
= json.load(f)
keys #labels the api to call it in the code
= keys['yelp'] API_KEY
Data Collection
Introduction
The first step in any project is to collect data. The idea for this project stemmed from restaurants. The constant question of where to eat, what food is good, and what neighborhood has the best food are all questions people ask daily. The one place everyone goes to determine where they should eat is Yelp. Here, we will be collecting data from Yelp, just like what you would see on your phone. These features will hopefully help us dig deeper into our story.
The first step is to create and load our API key from Yelp. For security reasons, this key is stored elsewhere and is not directly included here. To create your own API, go to the Yelp’s developer portal to get started.
Method: Obtaining Data
The following code blocks outline the process of fetching data from the Yelp API and cleaning it into a readable CSV file.
The first block includes the necessary documentation to call the Yelp API and retrieve data. Some of this code is sourced directly from Yelp’s developer website. https://docs.developer.yelp.com/docs/fusion-intro - Here is the link to getting started using a API from Yelp.
The second block processes the received data by extracting only the relevant information needed for the CSV file. This includes gathering key details that are readily available on Yelp, such as restaurant name, rating, and location. The extracted data is then stored in a Pandas DataFrame.
The final block converts the Pandas DataFrame into a CSV file. Since each API response contains information for up to 50 restaurants, each CSV file includes data for a maximum of 50 Yelp rated restaurants.
#reads in the request pacakge
import requests
# this sets up the API request
= {'Authorization': f'Bearer {API_KEY}'}
headers # this function fetches restaurant's data from Yelp's API
# location = which area to search for
# term = picks between restuarants, coffee shops, or bars (tried to get as much information about yelp reivews in DC)
# limit and offset = these are specific for yelp, limit to search for 50 restaurants at a time
# and start the offset at 0 so it goes to the next restaurant, like going to the next page
def fetch_restaurant_data(location, term="restaurants", limit=50, offset=0):
= 'https://api.yelp.com/v3/businesses/search'
url = {
params 'term': term,
'location': location,
'limit': limit,
'offset': offset, # Include offset in the params
}= requests.get(url, headers=headers, params=params)
response = response.json()
data return data
= 0 # Start from 50 for the next set of results
offset = 50 # You want to get 50 results at a time
limit
# Fetch data for restaurants in Washington D.C.
= fetch_restaurant_data('Washington D.C.', offset=offset, limit=limit) restaurants
# import package
import pandas as pd
# this creates the function to clean up the json requests from above and clean it up and have it as a pandas dataframe
def process_data(restaurants_data):
# creates an emtpy list to store the data in
= []
restaurant_list for business in restaurants_data['businesses']:
# this adds all these specific items together
restaurant_list.append({'name': business['name'],
'cuisine': business['categories'][0]['title'] if business['categories'] else 'Unknown',
'price_range': business.get('price', 'N/A'),
'rating': business.get('rating', 'N/A'),
'review_count': business.get('review_count','N/A'),
'neighborhoods': business.get('neighborhoods', 'N/A'),
'latitude': business['coordinates']['latitude'],
'longitude': business['coordinates']['longitude'],
'zip_code': business['location']['zip_code'],
})#saves it as a dataframe
= pd.DataFrame(restaurant_list)
df return df
# takes the function above to create a dataframe from the json information above
= process_data(restaurants)
df_restaurants # prints the first few results to insure everything looks good
print(df_restaurants.head())
name cuisine price_range rating review_count \
0 Unconventional Diner New American $$ 4.4 2946
1 L'Ardente Italian $$$ 4.5 1242
2 Grazie Nonna Italian $$ 4.1 536
3 Old Ebbitt Grill Bars $$ 4.2 11086
4 Gypsy Kitchen DC Tapas/Small Plates $$ 4.3 919
neighborhoods latitude longitude zip_code
0 N/A 38.906139 -77.023800 20001
1 N/A 38.898919 -77.014074 20001
2 N/A 38.904010 -77.035000 20005
3 N/A 38.897967 -77.033342 20005
4 N/A 38.914880 -77.031550 20009
# print the results again to insure everything is correct
#print(df_restaurants)
#saves the data frame to a csv file int he raw_data folder
'../../data/raw-data/df_coffee5.csv')
df_restaurants.to_csv(#/Users/rachnarawalpally/project-rachnarawalpally/data/raw-data
name cuisine price_range \
0 Pitango Gelato & Coffee Coffee & Tea $$
1 Capital One Café Coffee & Tea N/A
2 Mah-Ze-Dahr Bakery Bakeries $$
3 Junction Bistro Bar and Bakery Bakeries $$
4 Coffee Alley Coffee & Tea N/A
5 Gregorys Coffee Bakeries $$
6 Atrium Cafe Cafes $
7 Mitsitam Cafe American $$
8 Cafe Levantine Lebanese N/A
9 Capital One Café Banks & Credit Unions N/A
10 Zeleno Sandwiches $$
11 Le Caprice DC Cafes $
12 Union Kitchen Coffee & Tea $$
13 Bluestone Lane Coffee & Tea N/A
14 Corella Café and Lounge Coffee & Tea $$
15 Casey's Coffee & Sandwiches Coffee & Tea $$
16 Union Kitchen Grocery Coffee & Tea $$
17 L.A. Burdick Handmade Chocolates Chocolatiers & Shops N/A
18 Point Chaud Cafe & Crepes Creperies $$
19 Vigilante Coffee Coffee & Tea $$
20 Three Whistles Shared Office Spaces $
21 Adulis Coffee and Roastery Coffee & Tea N/A
22 Timgad Café Cafes N/A
23 Café du Parc French $$
24 Morsel's Coffee & Tea $$$
25 Sheba Café Cafes N/A
26 Licht Cafe Cafes N/A
27 Blank Street Coffee & Tea N/A
28 Colada Shop Coffee & Tea $$
29 Commonwealth Joe Cafes $
30 Uptown Cafe Coffee & Tea $
31 Morning My Day Bakeries $
32 Caseys Coffee Coffee & Tea N/A
33 Merriweather Cafe Cafes N/A
34 Soricha Tea & Theater Coffee & Tea $$
35 Peet's Coffee Coffee & Tea $$
36 La Bohemia Bakery Bakeries $
37 Milk + Honey Café Coffee & Tea N/A
38 Baker’s Daughter Breakfast & Brunch N/A
39 Turkish Coffee Lady Coffee & Tea $$
40 Cortado Cafe Cafes $$
41 Tiger Sugar Boba Bubble Tea shop DC Bubble Tea $$
42 Call Your Mother Deli - Georgetown Delis $$
43 Cafe Integral Coffee & Tea $$
44 Black Coffee Coffee & Tea $$
45 Coffee Nature Coffee & Tea $
46 Bread & Chocolate Breakfast & Brunch $$
47 Le Bon Cafe Coffee & Tea $$
48 Mo Mo Bakery Bakeries $
49 The Hill Cafe Coffee & Tea N/A
rating review_count neighborhoods latitude longitude zip_code
0 4.2 1040 N/A 38.895058 -77.021854 20004
1 4.3 24 N/A 38.867232 -76.988468 20020
2 4.3 137 N/A 38.858644 -77.049471 22202
3 4.2 95 N/A 38.894935 -77.002259 20002
4 4.0 1 N/A 38.898571 -77.021774 20001
5 3.8 87 N/A 38.876988 -77.004496 20003
6 3.8 106 N/A 38.884277 -77.018194 20024
7 3.5 538 N/A 38.888184 -77.016863 20560
8 4.7 29 N/A 38.935452 -77.179605 22101
9 4.0 63 N/A 38.904992 -77.062633 20007
10 4.3 95 N/A 38.911483 -77.044128 20009
11 3.5 349 N/A 38.932815 -77.032744 20010
12 4.1 95 N/A 38.906762 -77.023699 20001
13 3.3 25 N/A 38.894308 -77.029739 20004
14 3.9 39 N/A 38.983660 -77.092950 20814
15 2.9 25 N/A 38.883440 -77.016027 20024
16 4.1 16 N/A 38.912090 -77.003690 20002
17 4.3 89 N/A 38.907180 -77.063050 20007
18 3.7 57 N/A 38.920154 -77.071873 20007
19 4.3 199 N/A 38.992035 -76.933845 20740
20 4.4 110 N/A 38.889560 -77.091200 22201
21 4.6 7 N/A 38.985367 -77.027355 20910
22 0.0 0 N/A 38.897080 -77.010790 20001
23 3.4 496 N/A 38.896491 -77.032656 20004
24 3.0 3 N/A 38.922855 -77.053824 20008
25 4.6 8 N/A 38.934610 -77.033200 20010
26 4.9 11 N/A 38.916837 -77.035461 20009
27 3.4 18 N/A 38.906160 -77.063010 20007
28 4.0 68 N/A 38.907064 -77.043662 20036
29 4.6 471 N/A 38.862669 -77.054934 22202
30 3.6 83 N/A 38.905458 -77.005096 20002
31 4.8 27 N/A 38.998113 -77.031311 20910
32 3.5 10 N/A 38.899840 -77.007540 20002
33 3.8 12 N/A 38.943640 -77.052620 20008
34 4.6 469 N/A 38.833030 -77.191437 22003
35 3.8 108 N/A 38.899230 -77.039980 20006
36 4.3 258 N/A 39.057995 -77.112355 20852
37 3.2 9 N/A 38.884751 -77.017456 20472
38 3.6 23 N/A 38.904533 -77.062452 20007
39 4.6 187 N/A 38.805617 -77.050411 22314
40 4.6 116 N/A 38.813170 -77.111088 22304
41 3.9 23 N/A 38.922077 -76.996569 20002
42 4.4 382 N/A 38.907617 -77.068837 20007
43 3.7 23 N/A 38.916040 -77.046870 20009
44 4.1 106 N/A 38.917900 -77.096820 20007
45 4.2 159 N/A 38.954480 -77.083120 20016
46 3.3 399 N/A 38.905660 -77.050450 20037
47 3.7 235 N/A 38.887260 -77.003357 20003
48 3.9 16 N/A 39.052131 -77.051026 20902
49 4.3 56 N/A 38.891074 -76.983391 20002
Final Thoughts
This code demonstrates how to retrieve data from Yelp’s API to gather information on businesses such as restaurants, coffee shops, and bars (as used in this project). The script can be easily modified to query any other types of businesses available on Yelp. It provides a simple and efficient way to. Fetch Yelp data based on specific parameters. Convert the raw JSON response into a pandas DataFrame. Save the DataFrame as a CSV file for later analysis. By altering the search term (e.g., “restaurants”, “coffee shops”, or “bars”) and adjusting other parameters like limit and offset, you can customize the data retrieval to suit your needs. This method simplifies the process of gathering Yelp data for analysis and ensures easy access to relevant business information.