Food Recipe Recommendation Based On Ingredients de
Food Recipe Recommendation Based On Ingredients de
Food Recipe Recommendation Based On Ingredients de
ABSTRACT
Food is essential for human survival, and people always try to 1 INTRODUCTION
taste different types of delicious recipes. Frequently, people People nowadays become very much health conscious and,
choose food ingredients without even knowing their names or they try to take at most healthy food in their meal. An appropriate
pick up some food ingredients that aren’t obvious to them from a food recipe must require serving the real taste of the food and a
grocery store. Knowing which ingredients can be mixed to make a balanced diet, and healthy life. As we all know, health is wealth,
delicious food recipe is essential. Selecting the right recipe by so food selection is essential for better health. Sometimes people
choosing a list of ingredients is very difficult for a beginner cook. pick some ingredient that they do not even know their names. Or
However, it can be a problem even for experts. There is the pick up some ingredients from a grocery store, but they do not
constant use of machine learning in our everyday lives. One such know how to make recipes using those ingredients. People should
example is recognizing objects through image processing. know which ingredients can be mixed and make delicious recipes.
Although this process is complex due to different food For a beginner, it is very challenging to select a recipe just by
ingredients, traditional approaches will lead to an inaccuracy rate. seeing the ingredients. Even for an expert cook, it is also
These problems can be solved by machine learning and deep problematic. In this paper, we try to solve this problem. We
learning approaches. In this paper, we implemented a model for implemented an approach of recognizing food ingredients, and
food ingredients recognition and designed an algorithm for after analyzing and classifying these recognized ingredients, food
recommending recipes based on recognized ingredients. We made that can be cooked with compatible recipes will be recommended.
a custom dataset consisting of 9856 images belonging to 32 This recommendation will help people to get a suitable food
different food ingredients classes. Convolution Neural Network recipe. Online cooking recipe sites have become very popular, but
(CNN) model was used to identify food ingredients, and for recipe they sometimes get confused when people want to cook by
recommendations, we have used machine learning. We achieved following these sites. This may cause to stop users from referring
an accuracy of 94%, which is quite impressive. to these types of cooking sites throughout searching at grocery
stores. Technology has to be improved to solve these problems,
and then object recognition technology was introduced to the
world. There has been significant progress in object recognition
technology. Object recognition is a technology that identifies the
objects shown in an image based on their specific properties [1].
So, Object recognition is a process that finds things in the real
world from analyzing an image. To use object recognition,
KEYWORDS
computer vision, and deep learning models, open-source software
libraries such as the Open Computer Vision Library (OpenCV)
Deep Learning, CNN, Ingredients Detection, [2], TensorFlow [3], NumPy [4], and Keras [5] have been used
Recipe Recommendation, Resnet50 widely. With these libraries, we can implement object recognition
schemes on PCs and mobile devices such as iPhones and Android
ACM Reference format:
smartphones. With the recent progress of mobile devices and their
explosive spread, we have now been able to recognize objects at
Md. Shafaat Jamil Rokon, Md Kishor Morol, Ishra Binte Hasan, A. M.
Saif, Rafid Hussain Khan. 2022. Food Recipe Recommendation Based on any time on a mobile device. OpenCV [2] can be used in image
Ingredients Detection Using Deep Learning. In International Conference processing and representation. Deep learning APIs like NumPy
on Computing Advancements (ICCA 2022), March 10–12, 2022, Dhaka, can do the numerical computation, making machine learning
Bangladesh.
faster and easier. TensorFlow[3] API can be used to recognize the
image and train deep learning architectures for image
classification [4]. There are pre-trained deep learning architectures
such as MobileNet [6], ResNet [7], and Inception [8], VGG16 [9] nearest neighbors to recommend recipes from the Recipe 1M
can be used too. The advantage of using pre-trained architecture is dataset.
that we can modify it using transfer learning [10]. With the help Chen, J. et al. [17] used a dataset composed of 61,139 image-
of these libraries, APIs, and architectures, we can easily recognize recipe pairs gathered from the Go Cooking website. They have
objects from an image. explored the recent advances in cross-modality learning to address
the problems as mentioned earlier. A deep model stacked attention
For recipes identification, very little work has been done network (SAN) is absorbed and revised in their system.
because of not having a proper dataset, and also, working on
multiple object recognizing is still challenging. Most of the papers KeijiYanai, et al. [18] made a food ingredient recognition system
we found are just on recipes identification or food for android OS. The system used a color-histogram-based bag-of-
recommendation. But we aim to detect food ingredients by features pull out from several frames as an image file and a linear
reaching out maximum classification rate and recommending kernel SVM as a classifier. They stored 30 videos in the database,
worthy food recipes. In the following section we discuss about and their accuracy was 83.93%.
related previous works.
Suyash Maheshwari et al. [19] proposed two algorithms to
recommend ingredients. Cooking and tasting experiments showed
the proposed methods were effective for each purpose. Through
2 RELATED WORKS cooking demonstration with the recommended alternative
Food detection, ingredient detection, or recipe ingredients and subjective evaluation experiments, it was
recommendation have received increasing attention in recent confirmed that both algorithms recommended acceptable
years. All these profound deep learning-based works related to ingredients for over 90 percent. However, algorithm one might
object detection and classification. There are adequate deep not recommend an alternative ingredient similar to the exchange
learning models and methods available for object detection and ingredient.
classification. Use of color histogram, BoF, linear kernel SVM
classifier, and K-nearest neighbors are well known. Mona Mishra, Yifan Gong et al. [20] for their work, thousands of
images are trained for various categories to build a CNN for
Kawano et al. [12] presented an approach for a real-time food image recognition. And predict what the object in the image from
recognition system for smartphones. Two types of real-time image users upload.
recognition methods have been used. One is the combination of
bag-of-features (BoF) and color histograms with X2 kernel feature So, we understood that we have a good scope of making a model
maps. And the second one is the HOG patch descriptor and a that can identify ingredients and then recommend recipes based
color patch descriptor with the state- of-the-art Fisher Vector on those ingredients. In the following section, we will discuss our
representation. Linear SVM has been used as a classifier. They proposed model.
have achieved an accuracy of 79.2% classification rate.
Bolaños et al. [13], used CNN for food image recognition. They
use two different inputs for their method. The first one provides a 3 METHODOLOGY
low-level description of the food image. The penultimate layer of
the InceptionResNetV2 CNN is used in the first method, and the 3.1 Proposed Method
second one provides a high-level description of the food image First, we divided our work into two parts. One is identifying 32
using LogMeals API. Three different CNNs were supplied by this ingredients where we used Convolution Neural Network (CNN),
LogMeals API that predicts food groups, ingredients, and dishes. and after getting the result, we proceed to the second part. Secord
part is to the recipe recommendations containing 19 different
Chang Liu et al. [14], proposed CNN based novel approach for classes.
visual-based food image recognition with 7-layer architecture and
achieved 93.7% top-5 accuracy using the UEC-100 dataset were We have used Transfer Learning for training our CNN model for
existing approaches like SURF- BoF + Color Histogram and the first part. Reuse of a pre-trained model is known as transfer
MKL could achieve 68.3% to 76.8% top-5 accuracy with the same learning. As a pre-trained model, we have used ResNet50.
dataset. Transfer learning is immensely popular in deep learning because
it can train deep neural networks with relatively little data [11].
Raboy-McGowan and L. Gonzalez, et al. [15][16] both used the Transfer learning is efficient since most real-world problems don't
Recipe 1M dataset and introduced Recipe Net, a food to recipe have millions of labeled data points to train such complex deep
generator trained on the Recipe 1M dataset. They have used neural networks.
ResNet-50, DenseNet-121 convolutional neural networks tclassify
food images and encode their features. After that, they used K- As a base model, we have used 50 layers deep convolution neural
network, which is ResNet50. ResNet50 was trained on more than
a million images in the ImageNet dataset. The pre-trained network flatten layer and a new prediction layer for our required 32
can classify images into 1000 categories, but we need 32 object classes.
categories such as Fried Rice, Chotpoti, Subway, etc. Pre-trained
ResNet50 does not classify images by those 32 specific
categories. We created a new model from scratch for this specific
purpose, but for good results, we would need many images with
labels for determining Fried Rice and Chotpoti, etc.
It was possible to gain good results with fewer data using transfer Fig. 3. Connection between last remaining layer and new prediction layer
learning. That's why we chose transfer learning. An early layer of
a deep learning model identifies shapes, a last layer identifies
more complex visual patterns, and the final layer makes We have a lot of connections here. In figure 3, we can see the
predictions. Due to the similar low-level patterns involved in most ResNet50 model has many layers, we have just cut off the last
computer vision problems, most layer layers of a pre-trained layer. What's left in the last layer is information about our image
model are helpful for new applications. This means that most of content stored as a series of numbers in a tensor. It should be a
the layers of the pre-trained ResNet50 models can be reused and one-dimensional tensor, also known as a vector. It can be shown
we need to replace only that final layer that is used to generate as a series of dots. Dots are called nodes. The first node represents
predictions. That is why we've cut down the last prediction layer the first number in the vector and the second node represents the
of the ResNet50 model trained for 1000 classes and added a
second number and so on. We want to classify the images into 32 Our next step is to use an average pool, and then finish it up
categories: Beef Meat, Bell pepper, Burger Bun, Burger Patty, with a fully connected layer of 1000 nodes. Lastly, there is a
Chickpea, Cheese, Chicken, Chocolate, Chocolate Syrup, Donut SoftMax function that provides us 1 layer. Max-pooling layers,
Bun, Egg, Ground Beef, Hotdog Bun, Ice Cream, Lasagna along with activation functions, are not counted. In total, we have
Noodles, Letus, Mayonnaise, Milk, Mushrooms, Nachos Chips, a 1+9+12+18+9+1=50 layers Deep Convolutional Network.
Noodles, Onion, Pasta, Pizza Dough, Potato, Rice, Roll Bread,
Sandwich Bread, Sausage, Shawarma Bread, Subway Bun, and
Tomato. In the last layer, we keep the pre-trained model and add 3.3 Recipe Recommendation
another layer with 32 nodes. The first node capture how the Beef
Meat image is, the second node capture how the Bell pepper As mentioned before our aim is to recommend cooking recipes
image is, the third node capture how the Burger Bun image is, and from recognized food ingredients. We have generated our recipe
so on. We've used training data to determine which nodes suggest database and algorithm for recipe recommendations. We have
an image is Burger Bun, which is Pasta, and so on. selected 19 kinds of cooking recipes associated with 32
ingredients and designed a 2D matrix (19 rows × 32 columns) for
3.2 Architecture the recipe recommendation algorithm in which 19 rows contain 19
It is a variant of ResNet [7]. In ResNet 50 there are 48 recipes and 32 columns contain 32 food ingredients.
convolution layers as well as 1 MaxPool and 1 Average Pool
layer. In this network, there is a total of 50 layers
4 CONCLUSION