📜 ⬆️ ⬇️

We teach the robot to cook pizza. Part 1: Getting the Data


Image By: Chuchilko


Not so long ago, after the completion of the next contest for Kaggle, an idea arose to try to make a test ML application.
For example, this: "help the robot to make a pizza . "


Of course, the main purpose of this is exactly the same - the study of the new.


I wanted to understand how Generative Adversarial Networks (GAN) work.


The key idea was to train the GAN, which on the selected ingredients itself collects a picture of a pizza.


Well, let's get started.


Start


Of course, to train any machine learning algorithm, we first need data.
In our case, there are not so many options - either to find a ready dataset, or to pull data from the Internet independently.


And then I thought - why not pull the data from the site Dodo-pizza.


Disclaimer

I have nothing to do with this pizzeria chain.
Honestly, I don’t particularly like their pizza - especially at a price (and size), in my city (Kaliningrad) there are more attractive pizzerias.


So, in the first paragraph of the action plan appeared:


  1. get the data from the site

Data loading


Since all the information we need is available on the Dodo-Pizza website, we apply the so-called parsing of sites (also known as Web Scraping).


This is where the article: Web Scraping with python will help us.


And only two libraries:


import requests import bs4 

Open the dodo-pizza site, click on the "View Code" browser and find the element with the necessary data.


On the main page you can get only a basic list of pizzas and their composition.
More information can be obtained by clicking on the product you like. A pop-up window will then appear, with detailed information and beautiful pizza pictures.

This window appears as a result of a GET request, which can be emulated by passing the necessary headers:


 headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36', 'Referer': siteurl, 'x-requested-with': 'XMLHttpRequest' } res = requests.get(siteurl, headers = headers) 

in response, we get a piece of html-code that can already be parsed.


Immediately, you can note that static content is distributed via akamaihd.net CDN


After a short experiment - it turned out the dodo_scrapping.py script, which receives the name of the pizzas from their dodo-pizza site, their composition, and also stores three pizza photos in separate directories.


The output of the script is several csv-files and directories with photos.
For this, the following actions are performed:



Information about the pizza is stored in the form of the form:
city, city URL, name, title ENG, pizza URL, content, price, calories, carbohydrates, proteins, fats, diameter, weight


What is good about programming automation scripts is that you can start them and lean back on the chair list to watch them work ...


The output turned out only 20 pizzas.
For each pizza you get 3 pictures. We are only interested in the third picture, which has a type of pizza on top.


Of course, after receiving the pictures, they need to be further processed - cut and center the pizza.
I think this should not be a particular problem, since all the pictures are the same - 710x380.


Data processing


After the scraping of the site, we got the kaggle familiar with the csv file with the data (and the directories with pictures).
It is time to explore the pizza.


We connect the necessary libraries.


 import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns np.random.seed(42) import cv2 import os import sys 

 df = pd.read_csv('pizzas.csv', encoding='cp1251') print(df.shape) 

 (20, 13) 

 df.info() 

 <class 'pandas.core.frame.DataFrame'> RangeIndex: 20 entries, 0 to 19 Data columns (total 13 columns): city_name 20 non-null object city_url 20 non-null object pizza_name 20 non-null object pizza_eng_name 20 non-null object pizza_url 20 non-null object pizza_contain 20 non-null object pizza_price 20 non-null int64 kiloCalories 20 non-null object carbohydrates 20 non-null object proteins 20 non-null object fats 20 non-null object size 20 non-null int64 weight 20 non-null object dtypes: int64(2), object(11) memory usage: 2.1+ KB 

 df.head() 


city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweight
0Kaliningrad/ KaliningradDouble pepperonidouble-pepperonihttps: //dodopizza.ru/Kaliningrad/Product/doubl ...Tomato sauce, a double portion of pepperoni and ...395257.5226.0410.7712.1125470 ± 50
oneKaliningrad/ KaliningradCrazy Pizzacrazy-pizzahttps: //dodopizza.ru/Kaliningrad/Product/crazy ...Tomato sauce, increased portions of chicken and ...395232.3731.339.087.6425410 ± 50
2Kaliningrad/ KaliningradDon Baconpizza-don-bekonhttps: //dodopizza.ru/Kaliningrad/Product/pizza ...Tomato sauce, bacon, pepperoni, chicken, cur ...39527425.29.814.825454 ± 50
3Kaliningrad/ KaliningradMushrooms and hamgribvetchinahttps: //dodopizza.ru/Kaliningrad/Product/gribv ...Tomato sauce, ham, mushrooms, mozzarella31518923.99.36.125370 ± 50
fourKaliningrad/ KaliningradPizza piepizza-piroghttps: //dodopizza.ru/Kaliningrad/Product/pizza ...Condensed milk, cranberries, pineapples315144.929.82.92.725420 ± 50


We give data to a more convenient form.


 df['kiloCalories'] = df.kiloCalories.apply(lambda x: x.replace(',','.')) df['carbohydrates'] = df.carbohydrates.apply(lambda x: x.replace(',','.')) df['proteins'] = df.proteins.apply(lambda x: x.replace(',','.')) df['fats'] = df.fats.apply(lambda x: x.replace(',','.')) df['weight'], df['weight_err'] = df['weight'].str.split('±', 1).str 

 df['kiloCalories'] = df.kiloCalories.astype('float32') df['carbohydrates'] = df.carbohydrates.astype('float32') df['proteins'] = df.proteins.astype('float32') df['fats'] = df.fats.astype('float32') df['weight'] = df.weight.astype('int64') df['weight_err'] = df.weight_err.astype('int64') 

 df.head() 

')
city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_err
0Kaliningrad/ KaliningradDouble pepperonidouble-pepperonihttps: //dodopizza.ru/Kaliningrad/Product/doubl ...Tomato sauce, a double portion of pepperoni and ...395257.51998904/26/200110.7712.112547050
oneKaliningrad/ KaliningradCrazy Pizzacrazy-pizzahttps: //dodopizza.ru/Kaliningrad/Product/crazy ...Tomato sauce, increased portions of chicken and ...395232.36999531.3300009.087.642541050
2Kaliningrad/ KaliningradDon Baconpizza-don-bekonhttps: //dodopizza.ru/Kaliningrad/Product/pizza ...Tomato sauce, bacon, pepperoni, chicken, cur ...395274.00000025.2000019.8014.802545450
3Kaliningrad/ KaliningradMushrooms and hamgribvetchinahttps: //dodopizza.ru/Kaliningrad/Product/gribv ...Tomato sauce, ham, mushrooms, mozzarella315189.00000023.9000009.306.102537050
fourKaliningrad/ KaliningradPizza piepizza-piroghttps: //dodopizza.ru/Kaliningrad/Product/pizza ...Condensed milk, cranberries, pineapples315144.89999429.7999992.902.702542050


Given that the nutritional value of the product is calculated per 100 grams, then for a better understanding, we multiply them by the mass of pizza.


 df['pizza_kiloCalories'] = df.kiloCalories * df.weight / 100 df['pizza_carbohydrates'] = df.carbohydrates * df.weight / 100 df['pizza_proteins'] = df.proteins * df.weight / 100 df['pizza_fats'] = df.fats * df.weight / 100 

 df.describe() 


pizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_errpizza_kiloCaloriespizza_carbohydratespizza_proteinspizza_fats
count20.0000020.00000020.00000020.00000020.0000020.020.00000020.020.00000020.00000020.00000020.000000
mean370.50000212.13449125.4435018.6925008.4425025.0457.70000050.0969.942043115.86795039.85795038.736650
std33.1622834.9591222.2041431.9762833.203580.043.7277460.0175.8359918.2954219.98980315.206275
min315.00000144.89999422.1000002,900,0002,7000025.0370.00000050.0608.57997488.42999912.18000011.340000
25%367.50000188.25000023.9750007.97500005.0500025.0420.00000050.0858.525000113.01000335.62500028.159999
50%385.00000212.50000024.9500009.0900008.2000025.0460.00000050.0966.358490114.77900239.58000035.930001
75%395.00000235.52749626.2800019.8000009.7750025.0485.00000050.01095.459991120.59700145.70750047.020001
max395.00000274.00000031.33000012.20000014,8000025.0560.00000050.01243.960000128.45300060.99999968.080001


It is time to find out what the most-most pizzas ...


So, the most high-calorie pizza


 df[df.pizza_kiloCalories == np.max(df.pizza_kiloCalories)] 


city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_errpizza_kiloCaloriespizza_carbohydratespizza_proteinspizza_fats
2Kaliningrad/ KaliningradDon Baconpizza-don-bekonhttps: //dodopizza.ru/Kaliningrad/Product/pizza ...Tomato sauce, bacon, pepperoni, chicken, cur ...395274.025.2000019.814.825454501243.96114.40800344.49200167.192001


The fattest pizza:


 df[df.pizza_fats == np.max(df.pizza_fats)] 


city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_errpizza_kiloCaloriespizza_carbohydratespizza_proteinspizza_fats
14Kaliningrad/ KaliningradMeatmyasnaya-pizzahttps: //dodopizza.ru/Kaliningrad/Product/myasn ...Tomato sauce, hunting sausages, bacon, ham ...395268.024.2000019.114.825460501232.8111.32000441.86000268.080001


The richest in carbohydrates:


 df[df.pizza_carbohydrates == np.max(df.pizza_carbohydrates)] 


city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_errpizza_kiloCaloriespizza_carbohydratespizza_proteinspizza_fats
oneKaliningrad/ KaliningradCrazy Pizzacrazy-pizzahttps: //dodopizza.ru/Kaliningrad/Product/crazy ...Tomato sauce, increased portions of chicken and ...395232.36999531.339.087.642541050952.71698128.45337.22831.323999


The richest in proteins:


 df[df.pizza_proteins == np.max(df.pizza_proteins)] 


city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_errpizza_kiloCaloriespizza_carbohydratespizza_proteinspizza_fats
7Kaliningrad/ KaliningradHawaiiangavayskaya-pizzahttps: //dodopizza.ru/Kaliningrad/Product/gavay ...Tomato sauce, pineapples, chicken, mozzarella315216.025.012.27.425500501080.0125.060.99999937.0


The heaviest pizza by weight:


 df[df.weight == np.max(df.weight)] 


city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_errpizza_kiloCaloriespizza_carbohydratespizza_proteinspizza_fats
eightKaliningrad/ KaliningradDodopizza-dodohttps: //dodopizza.ru/Kaliningrad/Product/pizza ...Tomato sauce, beef (minced meat), ham, peppa ...395203.89999422.18.68.925560501141.839966123.76000248.16000249.839998


The easiest pizza by weight:


 df[df.weight == np.min(df.weight)] 


city_namecity_urlpizza_namepizza_eng_namepizza_urlpizza_containpizza_pricekiloCaloriescarbohydratesproteinsfatssizeweightweight_errpizza_kiloCaloriespizza_carbohydratespizza_proteinspizza_fats
3Kaliningrad/ KaliningradMushrooms and hamgribvetchinahttps: //dodopizza.ru/Kaliningrad/Product/gribv ...Tomato sauce, ham, mushrooms, mozzarella315189.023.99.36.12537050699.388.42999934.41000122.57


Get pizza names


 pizza_names = df['pizza_name'].tolist() pizza_eng_names = df['pizza_eng_name'].tolist() print( pizza_eng_names ) 

 ['double-pepperoni', 'crazy-pizza', 'pizza-don-bekon', 'gribvetchina', 'pizza-pirog', 'pizza-margarita', 'syrnaya-pizza', 'gavayskaya-pizza', 'pizza-dodo', 'pizza-chetyre-sezona', 'ovoshi-i-griby', 'italyanskaya-pizza', 'meksikanskaya-pizza', 'morskaya-pizza', 'myasnaya-pizza', 'pizza-pepperoni', 'ranch-pizza', 'pizza-syrnyi-cyplenok', 'pizza-cyplenok-barbekyu', 'chizburger-pizza'] 

We get the way to the pictures - we are interested in the 3rd picture (top view)


 image_paths = [] for name in pizza_eng_names: path = os.path.join(name, name+'3.jpg') image_paths.append(path) print(image_paths) 

 ['double-pepperoni\\double-pepperoni3.jpg', 'crazy-pizza\\crazy-pizza3.jpg', 'pizza-don-bekon\\pizza-don-bekon3.jpg', 'gribvetchina\\gribvetchina3.jpg', 'pizza-pirog\\pizza-pirog3.jpg', 'pizza-margarita\\pizza-margarita3.jpg', 'syrnaya-pizza\\syrnaya-pizza3.jpg', 'gavayskaya-pizza\\gavayskaya-pizza3.jpg', 'pizza-dodo\\pizza-dodo3.jpg', 'pizza-chetyre-sezona\\pizza-chetyre-sezona3.jpg', 'ovoshi-i-griby\\ovoshi-i-griby3.jpg', 'italyanskaya-pizza\\italyanskaya-pizza3.jpg', 'meksikanskaya-pizza\\meksikanskaya-pizza3.jpg', 'morskaya-pizza\\morskaya-pizza3.jpg', 'myasnaya-pizza\\myasnaya-pizza3.jpg', 'pizza-pepperoni\\pizza-pepperoni3.jpg', 'ranch-pizza\\ranch-pizza3.jpg', 'pizza-syrnyi-cyplenok\\pizza-syrnyi-cyplenok3.jpg', 'pizza-cyplenok-barbekyu\\pizza-cyplenok-barbekyu3.jpg', 'chizburger-pizza\\chizburger-pizza3.jpg'] 

Loading pictures


 images = [] for path in image_paths: print('Load image:', path) image = cv2.imread(path) if image is not None: images.append(image) else: print('Error read image:', path) 

 Load image: double-pepperoni\double-pepperoni3.jpg Load image: crazy-pizza\crazy-pizza3.jpg Load image: pizza-don-bekon\pizza-don-bekon3.jpg Load image: gribvetchina\gribvetchina3.jpg Load image: pizza-pirog\pizza-pirog3.jpg Load image: pizza-margarita\pizza-margarita3.jpg Load image: syrnaya-pizza\syrnaya-pizza3.jpg Load image: gavayskaya-pizza\gavayskaya-pizza3.jpg Load image: pizza-dodo\pizza-dodo3.jpg Load image: pizza-chetyre-sezona\pizza-chetyre-sezona3.jpg Load image: ovoshi-i-griby\ovoshi-i-griby3.jpg Load image: italyanskaya-pizza\italyanskaya-pizza3.jpg Load image: meksikanskaya-pizza\meksikanskaya-pizza3.jpg Load image: morskaya-pizza\morskaya-pizza3.jpg Load image: myasnaya-pizza\myasnaya-pizza3.jpg Load image: pizza-pepperoni\pizza-pepperoni3.jpg Load image: ranch-pizza\ranch-pizza3.jpg Load image: pizza-syrnyi-cyplenok\pizza-syrnyi-cyplenok3.jpg Load image: pizza-cyplenok-barbekyu\pizza-cyplenok-barbekyu3.jpg Load image: chizburger-pizza\chizburger-pizza3.jpg 

Look at the picture


 def plot_img(img): img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb) print(images[0].shape) plot_img(images[0]) 

 (380, 710, 3) 


Pizza are located in the same area - cut out


 pizza_imgs = [] for img in images: y, x, height, width = 0, 165, 380, 380 pizza_crop = img[y:y+height, x:x+width] pizza_imgs.append(pizza_crop) print(pizza_imgs[0].shape) print(len(pizza_imgs)) plot_img(pizza_imgs[0]) 

 (380, 380, 3) 20 


See all the pictures


 fig = plt.figure(figsize=(12,15)) for i in range(0, len(pizza_imgs)): fig.add_subplot(4,5,i+1) plot_img(pizza_imgs[i]) 


Pizza four seasons clearly stands out in its structure, since, in fact, consists of four different pizzas.


Learn the ingredients


 def split_contain(contain): lst = contain.split(',') print(len(lst),':', lst) for i, row in df.iterrows(): split_contain(row.pizza_contain) 

 2 : [' ', '       '] 4 : [' ', '     ', ' ', ' - '] 6 : [' ', ' ', ' ', ' ', '  ', ' '] 4 : [' ', ' ', ' ', ' '] 3 : [' ', ' ', ' '] 4 : [' ', ' ', '   ', ' '] 4 : [' ', ' ', '    ', ' '] 4 : [' ', ' ', ' ', ' '] 9 : [' ', '  ()', ' ', ' ', '  ', ' ', '  ', ' ', ' '] 8 : [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '] 9 : [' ', ' ', ' ', '  ', ' ', ' ', '  ', ' ', ' '] 6 : [' ', ' ', ' ', ' ', ' ', ' '] 8 : [' ', ' ', '  ', ' ', ' ', ' ', '  ', ' '] 6 : [' ', ' ', ' ', '  ', '  ', ' '] 5 : [' ', '  ', ' ', ' ', ' '] 3 : [' ', ' ', '   '] 6 : [' ', ' ', ' ', ' ', ' ', ' '] 4 : [' ', ' ', ' ', ' '] 6 : [' ', ' ', ' ', '  ', ' ', '  '] 7 : [' ', ' ', ' ', '  ', ' ', '  ', ' '] 

The problem is that in several pizzas the following modifiers are indicated:



At the same time, after the modifiers, there may be a list of ingredients through the union I.


Hypotheses:



We see that the modifier "increased portion" refers only to mozzarella cheese.
It also occurs once:
"increased portion of mozzarella cheese"


By the way, it immediately catches the eye that the main sauce used is tomato, and cheese is mozzarella.


Clean the ingredients to normal


 def split_contain2(contain): lst = contain.split(',') #print(len(lst),':', lst) for i in range(len(lst)): item = lst[i] item = item.replace(' ', '') item = item.replace(' ', '') item = item.replace(' ', '') item = item.replace('', '') item = item.replace('', '') and_pl = item.find('  ') if and_pl != -1: item1 = item[0:and_pl] item2 = item[and_pl+3:] item = item1 lst.insert(i+1, item2.strip()) double_pl = item.find('  ') if double_pl != -1: item = item[double_pl+15:] lst.insert(i+1, item.strip()) lst[i] = item.strip() # last one for i in range(len(lst)): lst[i] = lst[i].strip() print(len(lst),':', lst) return lst ingredients = [] ingredients_count = [] for i, row in df.iterrows(): print(row.pizza_name) lst = split_contain2(row.pizza_contain) ingredients.append(lst) ingredients_count.append(len(lst)) ingredients_count 

   4 : [' ', '', '', '']   5 : [' ', '', '', '', '- ']   6 : [' ', '', '', '', ' ', '']    4 : [' ', '', '', ''] - 3 : [' ', '', '']  4 : [' ', '', '', '']  4 : [' ', '', '', '']  4 : [' ', '', '', '']  9 : [' ', ' ()', '', '', ' ', '', ' ', '', '']   8 : [' ', '', '', '', '', '', '', '']    9 : [' ', '', '', ' ', '', '', ' ', '', '']  6 : [' ', '', '', '', '', '']  8 : [' ', '', ' ', '', '', '', ' ', '']  6 : [' ', '', '', ' ', ' ', '']  5 : [' ', ' ', '', '', '']  3 : [' ', '', '']   6 : [' ', '', '', '', '', '']   4 : [' ', '', '', '']   6 : [' ', '', '', ' ', '', ' '] - 7 : [' ', '', '', ' ', '', ' ', ''] [4, 5, 6, 4, 3, 4, 4, 4, 9, 8, 9, 6, 8, 6, 5, 3, 6, 4, 6, 7] 

Let's look at the minimum and maximum number of ingredients.


 min_count = np.min(ingredients_count) print('min:', min_count) max_count = np.max(ingredients_count) print('max:', max_count) 

 min: 3 max: 9 

 print('min:', np.array(pizza_names)[ingredients_count == min_count] ) print('max:', np.array(pizza_names)[ingredients_count == max_count] ) 

 min: ['-' ''] max: ['' '  '] 

Interestingly, the most ingredients (9 pieces) in pizza: Dodo and Vegetables and mushrooms .


Fill the label of ingredients.


 df_ingredients = pd.DataFrame(ingredients) df_ingredients.fillna(value='0', inplace=True) df_ingredients 


0one23fourfive67eight
0Tomato saucepepperonipepperoniMozzarella00000
oneTomato saucechickpepperoniMozzarellasweet and sour sauce0000
2Tomato saucebaconpepperonichickRed onionMozzarella000
3Tomato saucehamChampignonMozzarella00000
fourCondensed milkcowberrypineapples000000
fiveTomato saucetomatoesMozzarellaoregano00000
6Tomato saucewhite cheeseMozzarellaoregano00000
7Tomato saucepineappleschickMozzarella00000
eightTomato saucebeef (mince)hampepperoniRed onionolivesBell pepperChampignonMozzarella
9Tomato saucepepperonihamwhite cheesetomatoesChampignonMozzarellaoregano0
tenTomato saucewhite cheeseolivesBell peppertomatoesChampignonRed onionMozzarellabasil
elevenTomato saucepepperoniolivesChampignonMozzarellaoregano000
12Tomato saucejalapenoBell pepperchicktomatoesChampignonRed onionMozzarella0
13Tomato sauceshrimpolivesBell pepperRed onionMozzarella000
14Tomato saucehunting sausagesbaconhamMozzarella0000
15Tomato saucepepperoniMozzarella000000
sixteenRanch SaucechickhamtomatoesgarlicMozzarella000
17Cheese saucechicktomatoesMozzarella00000
18Tomato saucechickbaconRed onionMozzarellabarbecue sauce000
nineteenCheese saucebeefbaconsalted cucumberstomatoesRed onionMozzarella00


 df_ingredients.describe() 


0one23fourfive67eight
count202020202020202020
uniquefour13ten1267fourfour3
topTomato saucepepperonipepperoniMozzarella00000
freqsixteenfour3fiveeightten15sixteen18


As expected - the most used sauce is tomato. Standard recipe - consists of 4 ingredients.


It's funny that a new recipe for pizza was formed .


Let's see how many times a particular ingredient occurs:


 df_ingredients.stack().value_counts() 

 0 69  19   16  8  7  7   7  6  5   4  4  4  4  3  2   2  1 -  1  1   1  1  1   1   1  () 1   1  1  1   1 dtype: int64 

Again: mozzarella, tomato sauce, pepperoni.


 df_ingredients.stack().value_counts().drop('0').plot.pie() 

 <matplotlib.axes._subplots.AxesSubplot at 0xea8d358> 


Now encode the ingredients.


 from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder ingredients_full = df_ingredients.values.tolist() # flatten lists flat_ingredients = [item for sublist in ingredients_full for item in sublist] print(flat_ingredients) print(len(flat_ingredients)) np_ingredients = np.array(flat_ingredients) #print(np_ingredients) labelencoder = LabelEncoder() ingredients_encoded = labelencoder.fit_transform(np_ingredients) print(ingredients_encoded) label_max = np.max(ingredients_encoded) print('max:', label_max) 

 [' ', '', '', '', '0', '0', '0', '0', '0', ' ', '', '', '', '- ', '0', '0', '0', '0', ' ', '', '', '', ' ', '', '0', '0', '0', ' ', '', '', '', '0', '0', '0', '0', '0', ' ', '', '', '0', '0', '0', '0', '0', '0', ' ', '', '', '', '0', '0', '0', '0', '0', ' ', '', '', '', '0', '0', '0', '0', '0', ' ', '', '', '', '0', '0', '0', '0', '0', ' ', ' ()', '', '', ' ', '', ' ', '', '', ' ', '', '', '', '', '', '', '', '0', ' ', '', '', ' ', '', '', ' ', '', '', ' ', '', '', '', '', '', '0', '0', '0', ' ', '', ' ', '', '', '', ' ', '', '0', ' ', '', '', ' ', ' ', '', '0', '0', '0', ' ', ' ', '', '', '', '0', '0', '0', '0', ' ', '', '', '0', '0', '0', '0', '0', '0', ' ', '', '', '', '', '', '0', '0', '0', ' ', '', '', '', '0', '0', '0', '0', '0', ' ', '', '', ' ', '', ' ', '0', '0', '0', ' ', '', '', ' ', '', ' ', '', '0', '0'] 180 [ 4 20 20 17 0 0 0 0 0 4 26 20 17 13 0 0 0 0 4 7 20 26 14 17 0 0 0 4 10 28 17 0 0 0 0 0 1 8 5 0 0 0 0 0 0 4 24 17 18 0 0 0 0 0 4 9 17 18 0 0 0 0 0 4 5 26 17 0 0 0 0 0 4 12 10 20 14 16 21 28 17 4 20 10 9 24 28 17 18 0 4 9 16 21 24 28 14 17 6 4 20 16 28 17 18 0 0 0 4 25 21 26 24 28 14 17 0 4 15 16 21 14 17 0 0 0 4 19 7 10 17 0 0 0 0 4 20 17 0 0 0 0 0 0 2 26 10 24 27 17 0 0 0 3 26 24 17 0 0 0 0 0 4 26 7 14 17 23 0 0 0 3 11 7 22 24 14 17 0 0] max: 28 

It turns out that for cooking, as many as 27 ingredients are used.


 for label in range(label_max): print(label, labelencoder.inverse_transform(label)) 

 0 0 1   2   3   4   5  6  7  8  9  10  11  12  () 13 -  14   15  16  17  18  19   20  21   22   23   24  25  26  27  

 lb_ingredients = [] for lst in ingredients_full: lb_ingredients.append(labelencoder.transform(lst).tolist()) #lb_ingredients = np.array(lb_ingredients) lb_ingredients 

 [[4, 20, 20, 17, 0, 0, 0, 0, 0], [4, 26, 20, 17, 13, 0, 0, 0, 0], [4, 7, 20, 26, 14, 17, 0, 0, 0], [4, 10, 28, 17, 0, 0, 0, 0, 0], [1, 8, 5, 0, 0, 0, 0, 0, 0], [4, 24, 17, 18, 0, 0, 0, 0, 0], [4, 9, 17, 18, 0, 0, 0, 0, 0], [4, 5, 26, 17, 0, 0, 0, 0, 0], [4, 12, 10, 20, 14, 16, 21, 28, 17], [4, 20, 10, 9, 24, 28, 17, 18, 0], [4, 9, 16, 21, 24, 28, 14, 17, 6], [4, 20, 16, 28, 17, 18, 0, 0, 0], [4, 25, 21, 26, 24, 28, 14, 17, 0], [4, 15, 16, 21, 14, 17, 0, 0, 0], [4, 19, 7, 10, 17, 0, 0, 0, 0], [4, 20, 17, 0, 0, 0, 0, 0, 0], [2, 26, 10, 24, 27, 17, 0, 0, 0], [3, 26, 24, 17, 0, 0, 0, 0, 0], [4, 26, 7, 14, 17, 23, 0, 0, 0], [3, 11, 7, 22, 24, 14, 17, 0, 0]] 

 onehotencoder = OneHotEncoder(sparse=False) ingredients_onehotencoded = onehotencoder.fit_transform(ingredients_encoded.reshape(-1, 1)) print(ingredients_onehotencoded.shape) ingredients_onehotencoded[0] 

 (180, 29) array([ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) 

Now we have data with which we can work.


Autoencoder


Let's try uploading pizza photos (top view) and try to train a simple compressing autoencoder.


 import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns np.random.seed(42) import cv2 import os import sys import load_data import prepare_images 


 pizza_eng_names, pizza_imgs = prepare_images.load_photos() 

 Read csv... (20, 13) <class 'pandas.core.frame.DataFrame'> RangeIndex: 20 entries, 0 to 19 Data columns (total 13 columns): city_name 20 non-null object city_url 20 non-null object pizza_name 20 non-null object pizza_eng_name 20 non-null object pizza_url 20 non-null object pizza_contain 20 non-null object pizza_price 20 non-null int64 kiloCalories 20 non-null object carbohydrates 20 non-null object proteins 20 non-null object fats 20 non-null object size 20 non-null int64 weight 20 non-null object dtypes: int64(2), object(11) memory usage: 2.1+ KB None ['double-pepperoni', 'crazy-pizza', 'pizza-don-bekon', 'gribvetchina', 'pizza-pirog', 'pizza-margarita', 'syrnaya-pizza', 'gavayskaya-pizza', 'pizza-dodo', 'pizza-chetyre-sezona', 'ovoshi-i-griby', 'italyanskaya-pizza', 'meksikanskaya-pizza', 'morskaya-pizza', 'myasnaya-pizza', 'pizza-pepperoni', 'ranch-pizza', 'pizza-syrnyi-cyplenok', 'pizza-cyplenok-barbekyu', 'chizburger-pizza'] ['double-pepperoni\\double-pepperoni3.jpg', 'crazy-pizza\\crazy-pizza3.jpg', 'pizza-don-bekon\\pizza-don-bekon3.jpg', 'gribvetchina\\gribvetchina3.jpg', 'pizza-pirog\\pizza-pirog3.jpg', 'pizza-margarita\\pizza-margarita3.jpg', 'syrnaya-pizza\\syrnaya-pizza3.jpg', 'gavayskaya-pizza\\gavayskaya-pizza3.jpg', 'pizza-dodo\\pizza-dodo3.jpg', 'pizza-chetyre-sezona\\pizza-chetyre-sezona3.jpg', 'ovoshi-i-griby\\ovoshi-i-griby3.jpg', 'italyanskaya-pizza\\italyanskaya-pizza3.jpg', 'meksikanskaya-pizza\\meksikanskaya-pizza3.jpg', 'morskaya-pizza\\morskaya-pizza3.jpg', 'myasnaya-pizza\\myasnaya-pizza3.jpg', 'pizza-pepperoni\\pizza-pepperoni3.jpg', 'ranch-pizza\\ranch-pizza3.jpg', 'pizza-syrnyi-cyplenok\\pizza-syrnyi-cyplenok3.jpg', 'pizza-cyplenok-barbekyu\\pizza-cyplenok-barbekyu3.jpg', 'chizburger-pizza\\chizburger-pizza3.jpg'] Load images... Load image: double-pepperoni\double-pepperoni3.jpg Load image: crazy-pizza\crazy-pizza3.jpg Load image: pizza-don-bekon\pizza-don-bekon3.jpg Load image: gribvetchina\gribvetchina3.jpg Load image: pizza-pirog\pizza-pirog3.jpg Load image: pizza-margarita\pizza-margarita3.jpg Load image: syrnaya-pizza\syrnaya-pizza3.jpg Load image: gavayskaya-pizza\gavayskaya-pizza3.jpg Load image: pizza-dodo\pizza-dodo3.jpg Load image: pizza-chetyre-sezona\pizza-chetyre-sezona3.jpg Load image: ovoshi-i-griby\ovoshi-i-griby3.jpg Load image: italyanskaya-pizza\italyanskaya-pizza3.jpg Load image: meksikanskaya-pizza\meksikanskaya-pizza3.jpg Load image: morskaya-pizza\morskaya-pizza3.jpg Load image: myasnaya-pizza\myasnaya-pizza3.jpg Load image: pizza-pepperoni\pizza-pepperoni3.jpg Load image: ranch-pizza\ranch-pizza3.jpg Load image: pizza-syrnyi-cyplenok\pizza-syrnyi-cyplenok3.jpg Load image: pizza-cyplenok-barbekyu\pizza-cyplenok-barbekyu3.jpg Load image: chizburger-pizza\chizburger-pizza3.jpg Cut pizza from images... (380, 380, 3) 20 

 def plot_img(img): img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb) plot_img(pizza_imgs[0]) 


, — , , .


 img_flipy = cv2.flip(pizza_imgs[0], 1) plot_img(img_flipy) 


15 :


 img_rot15 = load_data.rotate(pizza_imgs[0], 15) plot_img(img_rot15) 


( : 56 56) — 360 1 .


 channels, height, width = 3, 56, 56 lst0 = load_data.resize_rotate_flip(pizza_imgs[0], (height, width)) print(len(lst0)) 

 720 

 plot_img(lst0[0]) 



.


 image_list = lst0 image_list = np.array(image_list, dtype=np.float32) image_list = image_list.transpose((0, 3, 1, 2)) image_list /= 255.0 print(image_list.shape) 

 (720, 3, 56, 56) 

 x_train = image_list[:600] x_test = image_list[600:] print(x_train.shape, x_test.shape) 

 (600, 3, 56, 56) (120, 3, 56, 56) 

 from keras.models import Model from keras.layers import Input, Dense, Flatten, Reshape from keras.layers import Conv2D, MaxPooling2D, UpSampling2D from keras import backend as K #For 2D data (eg image), "channels_last" assumes (rows, cols, channels) while "channels_first" assumes (channels, rows, cols). K.set_image_data_format('channels_first') 

 Using Theano backend. 


 def create_deep_conv_ae(channels, height, width): input_img = Input(shape=(channels, height, width)) x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) x = MaxPooling2D(pool_size=(2, 2), padding='same')(x) x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) encoded = MaxPooling2D(pool_size=(2, 2), padding='same')(x) # at this point the representation is (8, 14, 14) input_encoded = Input(shape=(8, 14, 14)) x = Conv2D(8, (3, 3), activation='relu', padding='same')(input_encoded) x = UpSampling2D((2, 2))(x) x = Conv2D(16, (3, 3), activation='relu', padding='same')(x) x = UpSampling2D((2, 2))(x) decoded = Conv2D(channels, (3, 3), activation='sigmoid', padding='same')(x) # Models encoder = Model(input_img, encoded, name="encoder") decoder = Model(input_encoded, decoded, name="decoder") autoencoder = Model(input_img, decoder(encoder(input_img)), name="autoencoder") return encoder, decoder, autoencoder c_encoder, c_decoder, c_autoencoder = create_deep_conv_ae(channels, height, width) c_autoencoder.compile(optimizer='adam', loss='binary_crossentropy') c_encoder.summary() c_decoder.summary() c_autoencoder.summary() 

 _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 3, 56, 56) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 16, 56, 56) 448 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 16, 28, 28) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 8, 28, 28) 1160 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 8, 14, 14) 0 ================================================================= Total params: 1,608 Trainable params: 1,608 Non-trainable params: 0 _________________________________________________________________ _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_2 (InputLayer) (None, 8, 14, 14) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 8, 14, 14) 584 _________________________________________________________________ up_sampling2d_1 (UpSampling2 (None, 8, 28, 28) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 16, 28, 28) 1168 _________________________________________________________________ up_sampling2d_2 (UpSampling2 (None, 16, 56, 56) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 3, 56, 56) 435 ================================================================= Total params: 2,187 Trainable params: 2,187 Non-trainable params: 0 _________________________________________________________________ _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 3, 56, 56) 0 _________________________________________________________________ encoder (Model) (None, 8, 14, 14) 1608 _________________________________________________________________ decoder (Model) (None, 3, 56, 56) 2187 ================================================================= Total params: 3,795 Trainable params: 3,795 Non-trainable params: 0 _________________________________________________________________ 

 c_autoencoder.fit(x_train, x_train, epochs=20, batch_size=16, shuffle=True, verbose=2, validation_data=(x_test, x_test)) 

 Train on 600 samples, validate on 120 samples Epoch 1/20 10s - loss: 0.5840 - val_loss: 0.5305 Epoch 2/20 10s - loss: 0.4571 - val_loss: 0.4162 Epoch 3/20 9s - loss: 0.4032 - val_loss: 0.3956 Epoch 4/20 8s - loss: 0.3884 - val_loss: 0.3855 Epoch 5/20 10s - loss: 0.3829 - val_loss: 0.3829 Epoch 6/20 11s - loss: 0.3808 - val_loss: 0.3815 Epoch 7/20 9s - loss: 0.3795 - val_loss: 0.3804 Epoch 8/20 8s - loss: 0.3785 - val_loss: 0.3797 Epoch 9/20 10s - loss: 0.3778 - val_loss: 0.3787 Epoch 10/20 10s - loss: 0.3771 - val_loss: 0.3781 Epoch 11/20 9s - loss: 0.3764 - val_loss: 0.3779 Epoch 12/20 8s - loss: 0.3760 - val_loss: 0.3773 Epoch 13/20 9s - loss: 0.3756 - val_loss: 0.3768 Epoch 14/20 10s - loss: 0.3751 - val_loss: 0.3766 Epoch 15/20 10s - loss: 0.3748 - val_loss: 0.3768 Epoch 16/20 9s - loss: 0.3745 - val_loss: 0.3762 Epoch 17/20 10s - loss: 0.3741 - val_loss: 0.3755 Epoch 18/20 9s - loss: 0.3738 - val_loss: 0.3754 Epoch 19/20 11s - loss: 0.3735 - val_loss: 0.3752 Epoch 20/20 8s - loss: 0.3733 - val_loss: 0.3748 <keras.callbacks.History at 0x262db6a0> 

 #c_autoencoder.save_weights('c_autoencoder_weights.h5') #c_autoencoder.load_weights('c_autoencoder_weights.h5') 


 n = 5 imgs = x_test[:n] encoded_imgs = c_encoder.predict(imgs, batch_size=n) decoded_imgs = c_decoder.predict(encoded_imgs, batch_size=n) def get_image_from_net_data(data): res = data.transpose((1, 2, 0)) res *= 255.0 res = np.array(res, dtype=np.uint8) return res #image0 = get_image_from_net_data(decoded_imgs[0]) #plot_img(image0) 

 fig = plt.figure() j = 0 for i in range(0, len(imgs)): j += 1 fig.add_subplot(n,2,j) plot_img( get_image_from_net_data(imgs[i]) ) j += 1 fig.add_subplot(n,2,j) plot_img( get_image_from_net_data(decoded_imgs[i]) ) 


: 2:


Links


Source: https://habr.com/ru/post/335444/


All Articles