Anuraag Rath MBA Dissertation
Anuraag Rath MBA Dissertation
Anuraag Rath MBA Dissertation
PURCHASE DATASET OF
HOTEL GRAND CENTRAL
AT
HOTEL GRAND CENTRAL
BHUBANESHWAR
BONAFIDE CERTIFICATE
Date:
Place: Chennai
DEPARTMENT SEAL
DECLARATION
Date:
Place: Chennai
ACKNOWLEDGEMENT
I would like to thank Mr. TARAK MISHRA (Managing Director) for his
guidance, effort and suggestion. I am also thankful to Mr. SUBHRANSU
RATH (Director). Without his help and suggestions, it would have been
impossible for me to complete this project.
- Anuraag Rath
ABSTRACT
The objective of this Project is to implement Machine Learning Algorithms on Customers’
Sales dataset of Hotel Grand Central. In this project, the exploration of Machine Learning
systems have been executed to help predict prices of all the services preferred by the
Customers for the year 2019. This model would help to provide insights on the pricing
methods employed by the Hotel and aide Customers in predicting the bill amount based on
their preferences of services. Chapter I outlines the Introduction about the project, the
objectives, scope, problems, needs, benefits of the study. It also encompasses the Company
and Industry profiles. Section II consists of the Review of Literature, where multiple
previous studies on predictive/classification Machine Learning systems have been
implemented on Hotel datasets. Section III encompasses the research methodology of the
study. Section IV outlines the Data Science implementation and analysis, which is the
Machine Learning process and implementation of the Customers’ sales dataset. Chapter V,
discusses the Findings, Conclusion and the future scope of the study.
TABLE OF CONTENTS
1 INTRODUCTION 1
2 REVIEW OF LITERATURE 24
3 RESEARCH METHODOLOGY 33
56
BIBLIOGRAPHY
58
ANNEXURES
LIST OF CHARTS
Sno CHART PAGE
I
INTRODUCTION
1.1 INFORMATION SYSTEMS:
A computer is an inherently diverse tool. Businesses generally use a group of networked
computers to collect, organize, store, and transmit information. This network is also known
as a computer information system. In the field of computer information systems,
professionals work to optimize the application of networked computers in business
environments.
To be effective in this effort, these professionals must learn how to improve business
processes by implementing a computer information system that can accommodate the
specific needs of their organization. For example, if an organization is concerned with the
productivity of its employees, IT professionals could use the existing computer information
system to track and measure relevant metrics. The data from such a system could then be
used to design workplace policies that better promote optimal use of labor hours.
There are several major categories of computer information systems, each with specific
characteristics that make them unique. Here’s a look at the seven most commonly used
systems.
The operations handled by transaction processing systems are usually the straightforward,
day-to-day transactions that businesses conduct. These computerized systems perform
simple functions and record them. As an example, a transaction processing system would
likely be used to control inventory or track payroll.
The primary use of office automation systems is creating, storing, and transmitting data
throughout an organization’s network. This simplifies office tasks by keeping team
members connected and also provides management personnel with more control over the
flow of information within the company. When connected to these systems, users are able
to instantly interact with their colleagues using various forms of communication, such as
voice, email, videoconferencing, file transfers, or instant text messaging.
2
Management Information Systems
Businesses that collect large volumes of data rely on management information systems to
process that data into usable forms, such as reports and data summaries. These systems are
designed to help organizational managers and supervisors make decisions by providing
them with information about the various activities that occur within the business.
Decision support systems are advanced computer information systems that help
organizational leaders make decisions when the potential outcomes are uncertain.
Computer information systems specialists design these systems to perform complex
(usually mathematical) tasks, such as executing calculations, modeling data, comparing
datasets, and predicting the outcomes of scenarios based on available information.
Executive information systems are specifically designed for use by senior leaders, as they
usually compile a vast array of data regarding the internal and external affairs of an
organization. An executive information system distills massive amounts of detailed data
into structured, comprehensible formats. This helps senior managers stay up to date about
the overall status of their organization, allowing them to make informed strategic and
tactical decisions.
Expert Systems
Expert systems emulate the decision-making ability of a human by using reasoning to learn
facts based on the rules set by the individuals who designed them. These systems are some
of the earliest examples of basic artificial intelligence, and business leaders can use them to
develop solutions to complex problems, even within specialized professional domains, such
as medicine or engineering.
To track their financial data, such as investments, revenue, and tax obligations,
organizations may use an accounting information system. These systems can be used to
3
perform financial audits and generate accounting reports. This helps finance specialists and
business leaders streamline the processing of compiling or tracking accounting data.
One of the most recent trends is artificial intelligence and the use of machine learning. With
this, networks and systems have the ability to improve themselves without having a
programmer change anything. Networks have the ability to access, analyze, and data by
themselves.
Another trend, which also points to the future of computer information systems, is the
growing need for cybersecurity. As more businesses and organizations develop an online
presence and more people use the internet, the opportunity for cyber attacks also grows. To
protect the information of professionals working across myriad industries, cybersecurity
becomes critical.
Computer information systems have opened many doors in the public and private sectors.
By allowing organizations to communicate more effectively, these systems stimulate
creative innovation and make collaboration easier than ever before. These systems serve as
the foundation for cloud computing, which allows users to store data and use software that
is not installed on their own computers but instead hosted on a remote server elsewhere.
This allows businesses to immediately boost their efficiency without incurring massive
overhead costs.
Computer information systems give businesses a unique ability to customize the way they
use technology, allowing them to adapt to market factors in real time. The downside to
computer information systems is that they are subject to cyber threats, such as hackers,
malware, and viruses. Depending on the size of these systems, maintaining them may also
4
be costly on the macro level. Still, the benefits of using a computer information system are
likely to outweigh the costs.
Machine Learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access data and use it to learn for themselves.
The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better decisions in
the future based on the examples that we provide. The primary aim is to allow the
computers learn automatically without human intervention or assistance and adjust actions
accordingly.
But, using the classic algorithms of machine learning, text is considered as a sequence of
keywords; instead, an approach based on semantic analysis mimics the human ability to
understand the meaning of a text
As humans, we have many different ways we learn things. The way you learned calculus,
for example, is probably not the same way you learned to stack blocks. The way you
learned the alphabet is probably wildly different from the way you learned how to tell if
objects are approaching you or going away from you. The latter you might not even realize
you learned at all!
Similarly, when we think about making programs that can learn, we have to think about
these programs learning in different ways. Two main ways that we can approach machine
learning are Supervised Learning and Unsupervised Learning. Both are useful for
different situations or kinds of data available.
5
Supervised Learning:
Let’s imagine you’re first learning about different genres in music. Your music teacher
plays you an indie rock song and says “This is an indie rock song”. Then, they play you a
K-pop song and tell you “This is a K-pop song”. Then, they play you a techno track and say
“This is techno”. You go through many examples of these genres.
The next time you’re listening to the radio, and you hear techno, you may think “This is
similar to the 5 techno tracks I heard in class today. This must be techno!”
Even though the teacher didn’t tell you about this techno track, she gave you enough
examples of songs that were techno, so you could recognize more examples of it.
When we explicitly tell a program what we expect the output to be, and let it learn the rules
that produce expected outputs from given inputs, we are performing supervised learning.
A common example of this is image classification. Often, we want to build systems that
will be able to describe a picture. To do this, we normally show a program thousands of
examples of pictures, with labels that describe them. During this process, the program
adjusts its internal parameters. Then, when we show it a new example of a photo with an
unknown description, it should be able to produce a reasonable description of the photo.
When you complete a Captcha and identify the images that have cars, you’re labeling
images! A supervised machine learning algorithm can now use those pictures that you’ve
tagged to make it’s car-image predictor more accurate.
Unsupervised Learning:
Let’s say you are an Indian who has been observing the meals Americans eat. You see
people eating breakfasts, lunches, and snacks. Over the course of a couple weeks, you
surmise that for breakfast people mostly eat foods like:
• Cereals
• Bagels
• Granola bars
6
Lunch is usually a combination of:
Snacks are usually a piece of fruit or a handful of nuts. No one explicitly told you what
kinds of foods go with each meal, but you learned from natural observation and put the
patterns together. In unsupervised learning, we don’t tell the program anything about what
we expect the output to be. The program itself analyzes the data it encounters and tries to
pick out patterns and group the data in meaningful ways.
When people think of Machine Learning, they often think of a program that is taking in
data and generating predictions and insights. The process of performing Machine Learning
often requires many more steps before and after the predictive analytics.
1. Formulating a Question
2. Finding and Understanding the Data
3. Cleaning the Data and Feature Engineering
4. Choosing a Model
5. Tuning and Evaluating
6. Using the Model and Presenting Results
1. Formulating a Question
Let’s say we are performing machine learning for a high-traffic fast-casual restaurant chain,
and our goal is to improve the customer experience. We can serve this goal in many ways.
When we’re thinking about creating a model, we have to narrow down to one measurable,
7
specific task. For example, we might say we want to predict the wait times for customers’
food orders within 2 minutes, so that we can give them an accurate time estimate.
Arguably the largest chunk of time in any machine learning process is finding the relevant
data to help answer your question, and getting it into the format necessary for performing
predictive analysis.
We know that for supervised learning, we need labeled datasets, or datasets that have clear
labels of what their ground truth is. For an example like the restaurant wait time, this would
mean we would need many examples of past orders, tagged with how long the wait time
was. Maybe the restaurant already tracks this data, but we might need to augment the data
collection with a timer that starts when the customer orders, stops when the customer
receives their food, and records that information.
Creating this system of recording data, as well as gathering enough data to be able to train
our model will take time.
Once we have our data, you want to understand it so that you will know what model to
apply and what the outputs will mean. First, you will want to examine the summary
statistics:
You may also want to visualize the data, perhaps using box plots to identify outliers,
histograms to show the basic structure of the data, and scatter plots to examine relationships
between variables.
Let’s say we’re examining the existing distribution of wait times. We see that the overall
average is 6.25 minutes per order. But we also produce this histogram:
8
We might glean from this that there are two main groups of orders. One group seems to
cluster around 4 minutes, while another, smaller, group seems to cluster around 11 minutes.
We could use this to modify our question and build a model that will classify whether or
not an order will be in this “short” timeframe, or in the “long” timeframe. Is it dependent on
the food that it ordered? The time of day of the order?
Perhaps we just become aware of the bimodality of our data. If our model consistently
predicts a wait time of around 6 or 7 minutes, then we are not taking into account the true
structure of our data.
Real data is messy! Data may have errors. Some columns may be empty. The features
we’re interested in might require string manipulation to extract. Cleaning the data refers to
the process by which we address missing values and outliers, among other things that may
affect our insights.
We may see that we have a group of orders that took over 20 minutes, due to an emergency
in the kitchen one afternoon. This is pushing our average wait time up, and may skew our
predictions. If we want to model the more general functioning of the restaurant, we may
want to remove these values.
Feature Engineering refers to the process by which we choose the important features (or
columns) to look at, and make the appropriate transformations to prepare our data for our
model.
9
We might try:
After we test our model on the data we have, we might go back and reengineer features to
see if we get a better result.
4. Choosing a Model
Once we understand our dataset and know the problem we are trying to solve, we can begin
to choose a model that will help us tackle our problem.
If we are attempting to find a continuous output, like predicting the number of minutes
someone should wait for their order, we would use a regression algorithm.
If we are attempting to classify an input, like determining if an order will take under 5
minutes or over 10 minutes, then we would use a classification algorithm.
The different classification and regression algorithms work better on different types of
datasets. We use different models on categorical and numerical data, and different models
on datasets with many features and datasets with few features.
We often want to set a metric of success, so that we know the model we’ve chosen is good
enough. Are we looking for accuracy? Precision? Some combination of the two? We
discuss this in our lesson on Precision and Accuracy.
Each model has a variety of parameters that change how it makes decisions. We can adjust
these and compare the chosen evaluation metrics of the different variants to find the most
accurate model.
For example, let’s say we’re using a K-Nearest Neighbors regression algorithm to solve the
wait time prediction problem. This algorithm uses a parameter k, which you will learn
about in the KNN lesson. We can adjust k to get different results.
10
Is it ideal to compare against 3 nearest neighbors? 10? 1? We can try many different values
of k and see which one gives us the highest level of accuracy:
From this analysis, we would set our k to be 26, which got the highest level of accuracy.
When we achieve the level of accuracy we want on our training set, you can use the model
on the data you actually care about analyzing.
For our example, we can now start inputting new orders. The input could be an order, with
features like:
The output would be how long the order is expected to take. This information could be
displayed to users.
An important step is being able to convey what you’ve learned and created, so that people
can use it in the future.
Sometimes we learn more about our data by looking at the model. For example, using
Multiple Learning Regression can give us insights into the importance of each feature. We
can create a feature importance graph to visualize this for those unfamiliar with our model:
11
1.8 HOTEL INDUSTRY
The hotel industry is any types or forms of business relating to the provision of
accommodation in lodging, food and drinks and various types of other services that are
interconnected and which are intended for public service, both of which use the lodging
facilities or who simply use the services or the production of certain of the hotel.
Hotels offer enormous range of guests’ services such as banqueting, conference and fitness,
sport and facilities, beauty spas, bars, sophisticated restaurant, casinos, night clubs and
casinos. The Hotel sector consists of more than 15% of all the people who worked in the
hospitality sector. Hotels falls into a number of different categories which includes the
glamorous five-star resort international luxury chains, trendy boutiques, country house,
conference, leisure or guest houses. Many are owner run which offer personalized service
to guests. This very dynamic sector offers good quality accommodation, great variety of
food and beverage, together with other services for all types of customers.
With offering every kind of accommodation catering for every type of taste, the hotel sector
is undeniably constantly growing and evolving, while refining its offering, improving its
experience and creating new products to serve and satisfy customer on a local and global
level. The hotel sector is always striving to offer excellent customer service throughout its
operations.
12
1.8.2 HISTORY OF THE HOTEL INDUSTRY:
The history of the hospitality industry dates all the way back to the Colonial Period in the
late 1700s. The hotel industry has been the subject of important development and growth
over the years as it has faced World Wars, The Depression and various social changes.
However, the hotel industry as seen today takes form in the early 1950s and 60s, leading
the way for growth in to the dynamic industry. This had led to more and more people
traveling not only for business but also for leisure reasons, leading to the development
which can be seen nowadays.
The idea of renting an accommodation to visitors appeared since ancient times, and the
modern concept of a hotel as we know derives from 1794, when the City Hotel opened in
New York City; the City Hotel was claimed to be the first building designed exclusively
to hotel operations. The City hotel back then possessed 73 rooms and offered different
types of service. Similar operations soon appeared in such nearby cities as Baltimore,
Boston in 1809 and Philadelphia.
The industrial revolution, which started in the 1760s, facilitated the construction of hotels
everywhere, in mainland Europe, in England and in America.
The advent of new ways of transportations, hotels and resorts outside of major cities were
built in the countryside and began promoting their scenery and other attractions. The
concept of the vacation was developed and available to more and more of the population. In
the 1920’s, hotel building entered a boom phase and many famous hotels were opened,
From there a surge of hotels flooded American and the rest of the world with prominent
names such as Radisson, Marriot, Hilton and more others.
The rise in levels of income and standard of living but also coupled with an increase
in leisure time has been especially beneficial to the tourism industry. The advent
of technological progress particularly through higher capacity cruise ships and
aircrafts, computerized reservation systems, better road transport facilities have played key
roles in the global growth of hotel industry. Moreover, enhanced productivity have been
13
favourable to the industry by helping to cut costs and making travel and tourism products
more affordable without ignoring the fact that travel and tourism is now more safe and
secure despite the terrorist attack which threatens the industry.
As competition in the industry increases worldwide, the customers have reap great benefits
in terms of lower prices coupled from a wider choice as the organizations have
to differentiate their products from the crowd to appeal to specific market segments but also
strive to enhance the quality of their services. More and more innovative approaches to
marketing and promotion and creation of new products are pulling the demand to the
destinations. The governments as a facilitator, fund provider and legislator have also had
played its part on the development of the industry. New consumer needs and attitudes have
also fuelled the growth of specific segments for instance ecotourism is booming. One other
factor but not the least is the increased level of economic activity which has led to an
increase in business travel and also the growing trend of international mobility.
Despite global economic challenges, hotel developments continue to progress, with new
rooms injected into global supply by both independent hotels and group.
1.8.4 STATISTICS:
In an update of forecasts made at the beginning of the year, the World Travel & Tourism
Council (WTTC) predicts growth for the Travel & Tourism globally of 2.7%, only slightly
downgraded from the 2.8% that was expected for the industry at the beginning of the year.
The main reasons for the adverse trends are that WTTC expects world GDP growth to be
2.3% in 2012; down 0.2% from the beginning of the year.
The trend for Travel & Tourism figures has been positive for the beginning of 2012 and has
surpassed expectations from the start of the year. International tourist arrivals have grown
4.9% in the year from January to June, airline passenger traffic is up 6.8%, and hotel
occupancy rates are up in many markets.
14
In 2011 Travel & Tourism accounted for 255 million jobs globally generating 9 per cent of
world GDP while generating billions for host economies; explaining why the sector is a key
driver for investment and economic growth.
According to statistics from the World Tourism Organization (WTO), in 2008 an estimated
924 million international tourist arrivals, an increase of 1.76% compared to 2007.
According to statistics from the World Tourism Organization (WTO), in 2008 international
tourist arrivals amounted to 917 million visitors, representing an increase of 1.76%
compared to 2007. In 2009, international tourist arrivals fell to 882 million, representing a
worldwide decline of 4.4% over 2008.
The worldwide destinations recorded a total of 600 million arrivals, International tourist
arrivals in the whole world fell by 7% between January and August 2009, but the rate of
decline has eased in recent months. These results and recent economic data, confirms
UNWTO’s initial forecast a 5% decrease in international tourist arrivals during the year
2009. Specifically, the global tourism in 2011 grew by 4.4 per cent, reaching $980 million
international tourist arrivals. And the forecast for 2012 has just started, UNWTO expects
grow that a somewhat lower rate, but allows to reach 1,000 million international tourists.
15
1.9.1 ABOUT:
Hotel Grand Central is the ideal accommodation option for business as well as leisure
travelers.
Hotel Grand Central is one of the most easily accessible hotels in Bhubaneswar. Centrally
located just off the Bhubaneswar Railway Station (exit, platform No 6) and 4 Km from the
Airport, it has close proximity to all the business and tourist places of interest at
Bhubaneswar.
This budget hotel offers 31 A/C rooms to stay. The rooms at Hotel Grand Central are
categorized as Deluxe, Super Deluxe and Executive Deluxe Rooms. All the rooms are
lavishly furnished with Satellite LED TV, direct dial telephone, study table, mini bar and
Free Wi-Fi connectivity. The attached washrooms are equipped with bathroom toiletries
and receive a continuous supply of hot & cold water. The Executive Deluxe rooms
additionally have fruit basket, Tea/Coffee Maker and shaving kits available
The multi cuisine restaurant at Hotel Grand Central has a wide choice of mouth watering
Indian, Continental and Chinese cuisine that are bound to get our taste buds tingling with
delight.
Travel Desks care of all our travel related plans either by Air, Train or by Road. It can also
assist in making reservations of accommodations at other destinations. Car Rental facilities
are also available. We also undertake to arrange special sightseeing tours to distant heritage
sights, namely Puri and Konark. Travel desk also offers sightseeing tours to distant heritage
sights, namely Puri and Konark for us / our organisation on request.
One Conference Hall (up to 120 people) and One Board Room (up to 40 people) equipped
with state of the art communication equipments to take care of business & private
Conference & parties
Other facilities includes Free Parking, 24 hours Security & surveillance under CCTV, 24-
hour concierge, in house same day laundry service, Doctor on call, Currency Exchange and
acceptance of major credit cards. The hotel is equipped with Complimentary Wi-Fi Internet
access & CCTV.
16
1.9.2 FACILITIES:
Travel Desk care of all your travel related plans either by Air, Train or by Road. It can also
assist in making reservations of accommodations at other destinations. Car Rental facilities
are also available. We also undertake to arrange special sightseeing tours to distant heritage
sights, namely Puri and Konark. Travel desk also offers sightseeing tours to distant heritage
sights, namely Puri and Konark for you / your organisation on request.
17
1.9.3 ROOMS:
Deluxe Rooms:
Buffet Breakfast at Restaurant, Daily 1 ltr. Package drinking water as per occupancy,
Morning Newspaper, Free Wi-Fi Internet (2 gb Daily),Direct Dial phone, Individually
controlled Air Conditioning, Satellite LED TV, Custom made Toiletries, Shower over bath,
24 hrs hot/cold water.
18
Executive Deluxe Room
Services: Buffet Breakfast at Restaurant, Daily 1 ltr. Package drinking water as per
occupancy, Morning Newspaper, Free Wi-Fi Internet (2 gb Daily), Direct Dial phone,
Individually controlled Air Conditioning, Satellite LED TV, Custom made Toiletries,
Shower over bath, 24 hrs hot/cold water
Extra Tariff: Tea & coffee-making facilities, Fruit Basket, Shaving Kit.
1.9.4 RESTAURANT:
The multi cuisine restaurant at Hotel Grand Central has a wide choice of mouth watering
Indian, Continental and Chinese cuisine that are bound to get your taste buds tingling with
delight.
19
1.9.5 HOTEL POLICY:
• Hotels do not allow unmarried / unrelated couples or guest residing in the same City
to check-in. This is at full discretion of the hotel management. No refund would be
applicable in case the hotel denies check-in under such circumstances.
• The primary guest must be at least 18 years of age to be able to check-in the hotel.
• It is mandatory for guests to present valid photo identification at the time of check-
in. According to government regulations, a valid Photo ID has to be carried by
every person above the age of 18 staying at the hotel. The identification proofs
accepted are Adhar Card, Driving License, Voter ID Card, and Passport. Without
Original copy of valid ID the guest will not be allowed to check-in.
Protective Equipment for Guests: Face masks and gloves to be made available on request
for the guests.
Screening of Guests: Temperature check of all guests at the entry point. Any guest with a
temperature above 99.1℉ will be refused admission and may be politely redirected to the
20
Daily Disinfection of Rooms: To be disinfected with WHO- recommended phenolic
disinfectants every day.
Fresh Room Linen: Room linen to be changed once in a Day, or on request.
Soap Dispensers in Rooms: All washrooms to be equipped with liquid soap dispenser(s) or
packed soap bars.
Sanitization of Common Areas: Sanitization of common areas including reception,
elevators and lounge to be done every 6 hours with phenolic disinfectant.
Mandatory Masks & Gloves: House-keeping and service staff to wear masks (3-ply) and
gloves (single-use) at all times, and restaurant staff to wear a mask (3-ply) and hair net at
all times.
Mandatory Staff Training: Staff training to be done at least twice a week on social
distancing, hand hygiene and respiratory etiquette.
Mandatory Temperature Checks: Temperature check for staff twice a day and mandatory
leave for any employee having temperature above 99.1 F.
21
1.11 PROBLEMS IN STUDY:
• Implementation of the right Supervised Learning Algorithm
• Time Period for conducting the Research is limited
• Less Amount of Training Data
• Irrelevant/Unwanted Features
• The possibility of overfitting the Model
• Unclean Data
Machine learning is a booming technology because it benefits every type of business across
every industry. The applications are limitless. From healthcare to financial services,
transportation to cyber security, and marketing to government, machine learning can help
every type of business adapt and move forward in an agile manner.
You might be good at sifting through a massive organized spreadsheet and identifying a
pattern, but thanks to machine learning and artificial intelligence, algorithms can examine
much larger datasets and understand connective patterns even faster than any human, or
any human-created spreadsheet function, ever could. Machine learning allows businesses to
collect insights quickly and efficiently, speeding the time to business value. That’s why
machine learning is important for every organization.
Machine learning also takes the guesswork out of decisions. While you may be able to
make assumptions based on data averages from spreadsheets or databases, machine
22
learning algorithms can analyze massive volumes of data to provide exhaustive insights
from a comprehensive picture. Put shortly: machine learning allows for higher accuracy
outputs across an ever growing amount of inputs.
Machine Learning enables machines to make data-driven decisions, which is more efficient
than explicitly programming to carry out certain tasks. These algorithms are designed in a
fashion that gives exposure to new data that can help organizations learn and improve their
strategies. Machine Learning can provide various insights and help ask questions, which
were once not imaginable using which new solutions can be generated.
23
II
REVIEW OF LITERATURE
24
2.1 INTRODUCTION:
The basic objective of this chapter is to get inside into the previous findings so that it will
help to know the gap in earlier studies and to justify the research problem selected by the
researcher for the study purpose. The literature is reviewed on Machine Learning
implementations for various Hotel Datasets. The prominent areas covered in the present
literature of reviewed are studies related to concept, model, system, functions, Marketing
and Sales, Reviews, HR, recruitment and selection, rewards and recognition and other
issues in the Hotel Industry and how it was covered and solved using Machine Learning
systems.
25
humans in each type of task/job. They conclude that “AI job replacement
provides a road map about how AI advances to take over tasks requiring
different intelligences, how AI can and should be used to perform service
tasks, and finally how workers can and should shift their skills to achieve a
win–win between humans and machines. We conclude that the advance of
AI in all four intelligences creates opportunities for innovative human–
machine integration for providing service”.
• Purvika Bajaj, Renesa Ray, Shivani Shedge, Shravani Vidhate and Prof.
Dr. Nikhilkumar Shardoor in their paper “Sales prediction using
machine learning algorithms” mentions that with traditional methods not
being of much help to the business organizations in revenue growth, use of
Machine Learning approaches prove to be an important aspect for shaping
business strategies keeping into consideration the purchase patterns of the
consumers
• Binru Zhang, Yulian Pu, Yuanyuan Wang and Jueyou Li in their study
“Forecasting Hotel Accommodation Demand Based on LSTM Model
Incorporating Internet Search Index” conclude that the research in this
26
paper has a prominent theoretical significance. An empirical framework
based on web queries was constructed for the ever-growing sample of
tourism data. Secondly, the LSTM deep learning model was introduced for
the first time to forecast the hotel accommodation demands, extending the
application of DL methods in hotel demand forecasting.
REGRESSION:
27
• In Bohdan M. Pavlyshenko’s study, “Machine-Learning Models for
Sales TimeSeries Forecasting” mentions that The use of regression
approaches for sales forecasting can often give us better results compared to
time series methods. One of the main assumptions of regression methods is
that the patterns in the historical data will be repeated in future.
28
economic and non-economic model and trying to find where their strength
lies when starting a business using the personality threat.
• Purvika Bajaj, Renesa Ray, Shivani Shedge, Shravani Vidhate and Prof.
Dr. Nikhilkumar Shardoor in their paper “Sales prediction using
machine learning algorithms” conclude that with traditional methods not
being of much help to the business organizations in revenue growth, use of
Machine Learning approaches prove to be an important aspect for shaping
business strategies keeping into consideration the purchase patterns of the
consumers. Prediction of sales with respect to various factors including the
sales of previous years helps businesses adopt suitable strategies for
increasing sales and set their foot undaunted in the competitive world.
HOSPITALITY INDUSTRY:
29
• Nuno Antonio, Ana de Almeida and Luis Nunes in their study “An
Automated Machine Learning Based Decision Support System to
Predict Hotel Booking Cancellations” conclude that “The
decrease
in
the
number
of
actual
cancellations
on
bookings
where
customers
were
contacted,
a
total
in
excess
of
37
percentage
points,
corresponds
to
a
relative
cancellation
decrease
of
82%
for
H1
and
83%
for
H2.
These
findings
indicate
that
the
actions
taken
for
preventing
cancellations
in
identified
as
cancellable
bookings
amounted
in
a
total
revenue
in
the
order
of
approximately
€
39,000.00.”
DATA:
30
ALGORITHMS/MODELS IMPLEMENTED:
• Binru Zhang, Yulian Pu, Yuanyuan Wang and Jueyou Li in their study
“Forecasting Hotel Accommodation Demand Based on LSTM Model
Incorporating Internet Search Index” mention that the LSTM deep
learning model was introduced for the first time to forecast the hotel
accommodation demands, extending the application of DL methods in hotel
demand forecasting.
PATTERNS IN FEATURES:
• Binru Zhang, Yulian Pu, Yuanyuan Wang and Jueyou Li in their study
“Forecasting Hotel Accommodation Demand Based on LSTM Model
31
Incorporating Internet Search Index” mention that Machine Learning
approaches prove to be an important aspect for shaping business strategies
keeping into consideration the purchase patterns of the consumers.
PYTHON IMPLEMENTATION:
32
III
RESEARCH METHODOLOGY
33
3.1 DATA COLLECTION:
The Data for this project is collected from the Hotel’s Database. The Data collected
consists of prior Customer Sales Data on the purchases of various rooms/suites and
additional services. With this Data a simple Machine Learning algorithm can be
implemented to predict the price of Hotel packages for customers. Using this Machine
Learning Model, Customers would be able to predict the price of the rooms/suites made
available in the Hotel. The Machine Learning Model, which will be implemented, is
Multiple Linear Regression.
Linear Regression:
The purpose of machine learning is often to create a model that explains some real-world
data, so that we can predict what may happen next, with different inputs.
The simplest model that we can fit to data is a line. When we are trying to find a line that
fits a set of data best, we are performing Linear Regression.
We often want to find lines to fit data, so that we can predict unknowns. For example:
• The market price of a house vs. the square footage of a house. Can we predict how
much a house will sell for, given its size?
• The tax rate of a country vs. its GDP. Can we predict taxation based on a country’s
GDP?
• The amount of chips left in the bag vs. number of chips taken. Can we predict how
much longer this bag of chips will last, given how much people at this party have
been eating?
Imagine that we had this set of weights plotted against heights of a large set of professional
baseball players:
34
To create a linear model to explain this data, we might draw this line:
Now, if we wanted to estimate the weight of a player with a height of 73 inches, we could
estimate that it is around 143 pounds.
A line is a rough approximation, but it allows us the ability to explain and predict variables
that have a linear relationship with each other.
A line is determined by its slope and its intercept. In other words, for each point y on a line
we can say:
y=mx+b
Where m is the slope, and b is the intercept. y is a given point on the y-axis, and it
corresponds to a given x on the x-axis.
35
The slope is a measure of how steep the line is, while the intercept is a measure of where
the line hits the y-axis.
When we perform Linear Regression, the goal is to get the “best” m and b for our data.
As we try to minimize loss, we take each parameter we are changing, and move it as long
as we are decreasing loss. It’s like we are moving down a hill, and stop once we reach the
bottom:
The process by which we do this is called gradient descent. We move in the direction that
decreases our loss the most. Gradient refers to the slope of the curve at any point.
For example, let’s say we are trying to find the intercept for a line. We currently have a
guess of 10 for the intercept. At the point of 10 on the curve, the slope is downward.
Therefore, if we increase the intercept, we should be lowering the loss. So we follow the
gradient downwards.
To find the gradient of loss as intercept changes, the formula comes out to be:
36
• N is the number of points we have in our dataset
• m is the current gradient guess
• b is the current intercept guess
To find the m gradient, or the way the loss changes as the slope of our line changes, we can
use this formula:
Learning Rate:
We want our program to be able to iteratively learn what the best m and b values are. So
for each m and b pair that we guess, we want to move them in the direction of the gradients
we’ve calculated. But how far do we move in that direction?
We have to choose a learning rate, which will determine how far down the loss curve we
go.
A small learning rate will take a long time to converge one might run out of time or cycles
before getting an answer. A large learning rate might skip over the best value.
Finding
the
absolute
best
learning
rate
is
not
necessary
for
training
a
model.
One
just
have
to
find
a
learning
rate
large
enough
that
gradient
descent
converges
with
the
efficiency
you
need,
and
not
so
large
that
convergence
never
happens.
37
3.3 IMPLEMENTATION PLAN AND METHODOLOGY:
Data Collection:
The Data will be collected from the databases of Hotel Grand Central. The Data on which
the Data Science implementation will be executed will be the Customers’ purchase dataset.
Data Preprocessing:
The data collected wont be clean. This data is required to be converted and cleaned to make
it more efficient for the implementation of Machine Learning. The Data is required to be
transformed from String datatype to Intergers.
Feature Engineering:
Before the implementation of Machine Learning, the most relevant features have to
selected. The most important features, improves the efficiency and accuracy of the Model.
Feature Engineering is implemented using Pearson’s Correlation between all the features X
against the y variable. From the Correlation heatmap, the best features that correlate with
the y variable are selected for Machine Learning implementation.
38
This library in Python provides many unsupervised and supervised learning algorithms. It’s
built upon some of the technology you might already be familiar with, like NumPy, pandas,
and Matplotlib.
Algorithm to be implemented:
39
IV
DATA SCIENCE
IMPLEMENTATION
AND INTERPRETATION
40
4.1 DATA PREPROCESSING:
The dataset is first accessed from the external servers of Hotel Grand Central. Using
FTP (File Transfer Protocol), the data is downloaded from the server. The file “2019
Customer data.sql” file is downloaded.
41
The platform on which the Data Science process will be implemented is Jupyter Notebook.
We use Python 3.7 for the entire process.
The Python Libraries to be used are:
• Pandas (Data Cleaning and Manipulation)
• Numpy (Data Manipulation)
• Scikit-Learn (Machine Learning Library)
• Matplotlib (Data Visualization)
4.1.1 DATASET:
The Dataset on which we are implementing our Data Science process is contains the
purchase information of Customers.
42
4.1.2 DATA TRANSFORMATION:
The data that has been collected from Hotel Grand Central has perfect information on their
previous customers. The problem is that most of the Data are labeled as Strings. The Data
Science process cannot be implemented on the String data-type.
For example; The Feature, Roomtype, lists the 3 types of rooms which are, Business
rooms, Deluxe suites and Super Deluxe suites. This has to be converted to 0, 1 and 2 to
represent the same. Similar processes have to be carried out for the other Features.
For this implementation we make use of the map function of Python 3. Using Map
function, we can convert the values to Integers using the python3 Dictionary data
structure process.
43
Before After
The same process will be carried out for the other Features.
A single variable containing a Dictionary of Yes: 1 and No: 0 is created which is used for
all the features that contain additional services.
44
Before Data Engineering:
45
4.2 FEATURE ENGINEERING:
After the Data Engineering process, The Feature Engineering process is to be implemented.
Selecting the right Features for the Machine Learning is to be executed. For the
Visualization of the Features, we use Data Visualization libraries, Matplotlib and
Seaborn.
• Our X variable will contain the Room/Suites prices, all the Services provided by
Hotel Grand Central and other features.
• Our y variable will be the Bill amount.
46
CHART 4.1 - THE CORRELATION HEATMAP:
47
Interpretation:
From the above generated Heatmap chart, we can observe the level of Correlation between
Y(pay) and X (All services provided and other features). We can see that the Bill amount
has a high Correlation with Room Type(0.76), Days spent(0.63), BathService(0.62), Bar
Service(0.57), Gym Service(0.59), Washer/Dryer(0.63) and Personalized Service(0.59).
Interpretation:
From the above Chart, we can notice that there is almost No Correlation between the Floor
as to where the Rooms/Suites are made available and the Bill Amount. The Bill amount is
affected by the Floor location only on a very few exclusive cases such as the availability of
other Services.
48
CHART 4.4 - ROOM TYPE AND BILL AMOUNT CORRELATION:
Interpretation:
From the above Scatter plot chart we can observe that there is near perfect Correlation
between the Room Types and the Bill Amount. There are a few Varying cases which occur
because of the preferences by the Customers to purchase additional services.
49
4.3 MACHINE LEARNING IMPLEMENTATION:
The Data engineering and the Feature Engineering processes have been executed after
which the Machine Learning process can be implemented. Using Scikit-learn, The Linear
regression model can be imported and applied on the best Features selected. Based on the
Feature Engineering process, the best features that were selected were;
• "daysSpent"
• "bathService"
• "RoomType"
• "has_washer_dryer"
• "has_personalizedService"
• "barservice"
• "has_gym"
The data is then split between Training and Testing sets. 80% of the data will be
segregated for Training and the Testing will be implemented on the 20% of the segregated
data.
50
4.3.1 ACCURACY OF ALGORITHM:
The accuracy of the Model is about 86%. The accuracy could be increased if there were
more Data.
51
4.3.3 CREATING THE APP:
The Customers of Hotel Grand Central could use a full Application that predicts the Prices.
An Initial crude version of the Application has been created. This is the Mark 1 version of
the Application. The Application is created using Python3 and Shell Scripting.
52
V
FINDINGS, SUGGESTIONS
AND CONCLUSION
53
5.1 FINDINGS OF THE STUDY:
The findings of this Research are as follows:.
• Implementing a Machine Learning algorithm without Data engineering does not
produce accurate results.
• Without implementing Feature Engineering, the Machine Learning Model’s
accuracy is very low.
• In the Feature Engineering process we found out from the Heatmap that the features
Room Type(0.76), Days spent(0.63), Bath Service(0.62), Bar Service(0.57), Gym
Service(0.59), Washer/Dryer(0.63) and Personalized Service(0.59) have the highest
levels of Correlation with the Bill Amount.
• The Feature, Floor Level had a correlation of just 0.15 with Bill amount because it
depended upon the preferences for additional services of Customers.
• The level of Correlation between Room Type and Bill Amount was the highest
which is 0.76.
• The feature Minutes For Service resulted in a negative correlation with Bill Amount
i.e -0.3.
• The feature Damages Incurred proved to have no correlation with Bill amount i.e
0.012.
• Minutes for service proved to be the worst Feature in the Dataset because it has
negative correlations with every other feature.
• The features Spa, Dishwasher and Patio proved to have no Correlation with the
Bill amount.
• When a Dataset is made available to predict prices using past prices of the
Customers, the Machine Learning model, Multiple Linear Regression proves to be
the best model for implementation.
• It is very important to Train and Test the Dataset.
• The Model provides best results when 80% of the data is segregated for Training
set and 20% of the data for the Testing set.
• Multiple Linear Regression finds the best Slope(m) and Intercept(b) value using
Gradient Descent.
• Having more data would significantly improve the accuracy of the Model.
• With a limited number of Data, i.e 1012, the accuracy of the Model proved to be
86%.
54
• The Scikit-Learn library of Python 3.7 makes it easy and efficient for the
implementation of Machine Learning models.
• The Machine Learning Model estimates the Bill amount almost perfectly.
• The App makes it easy for the Hotel and the Customers to find the possible prices
of Hotel Grand Central.
5.2 SUGGESTIONS:
• Hotel Grand Central could implement Machine Learning processes on a regular
basis to Classify, Predict and Cluster information using the Data collected from
their customers.
• A similar app can be deployed to understand the Sales of their services.
• Other Machine Learning algorithms like K Nearest Neighbors could be used to
classify and deploy a recommender engine for Customers to select services
preferred by them.
• Data Science process could be implemented on a daily process for Sales, Marketing,
HR and Operations/Production processes.
5.3 CONCLUSION:
The customers’ sales data collected from the Hotel Grand Central has been used to
implement a Linear Regression Machine Learning Model to predict the Price of the
services. The execution of Data Engineering made it possible to clean the data and
replacing values of String data-type into Integers using which the Model performance
improved significantly. Next, Feature Engineering was performed where the best
features were selected using Pearson’s correlation. The Correlation Heatmap showed
that Room Type(0.76), Days spent(0.63), Bath Service(0.62), Bar Service(0.57), Gym
Service(0.59), Washer/Dryer(0.63) and Personalized Service(0.59) are the best features
in the Dataset. These features were then fit into the Model as the X variable and the Bill
amount was fit as the y variable. For the Machine Learning process the dataset was
divided into an 80:20 ratio for Training and Testing the data. The LinearRegression()
class of Scikit-Learn was then implemented to execute the Machine Learning Model.
The Model proved to predict prices with about an 86% accuracy. A new Data was then
entered to predict the price of the Customer. Therefore Multiple Linear regression
algorithm proves to be the best Classification model to predict Prices using past data.
55
BIBLIOGRAPHY
56
References:
1. http://www.hotelgrandcentral.com/
2. https://www.codecademy.com/paths/machine-learning/tracks/introduction-to-
machine-learning-skill-path/modules/introduction-to-machine-learning-skill-
path/articles/the-ml-process
3. https://www.codecademy.com/paths/machine-learning/tracks/introduction-to-
machine-learning-skill-path/modules/introduction-to-machine-learning-skill-
path/articles/scikit-learn
4. https://www.codecademy.com/paths/machine-learning/tracks/introduction-to-
machine-learning-skill-path/modules/introduction-to-machine-learning-skill-
path/articles/machine-learning-supervised-vs-unsupervised
5. https://scikit-learn.org/stable/about.html
6. https://online.maryville.edu/online-bachelors-degrees/management-information-
systems/what-is/
7. https://www.ripublication.com/ijaer18/ijaerv13n22_50.pdf
8. https://sites.insead.edu/facultyresearch/research/file.cfm?fid=64810
9. https://www.researchgate.net/publication/334306392_An_Automated_Machine
_Learning_Based_Decision_Support_System_to_Predict_Hotel_Booking_Canc
ellations
10. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=r
ja&uact=8&ved=2ahUKEwi3v8aC2dnvAhVcKysKHdguBYs4ChAWMAF6BA
gCEAM&url=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttps%2Fwww.mdpi.com%2F2071-
1050%2F11%2F17%2F4708%2Fpdf&usg=AOvVaw3rEthDu9cD4DFPyQSbX
Tn8
11. https://www.ijitee.org/wp-content/uploads/papers/v8i6/F3835048619.pdf
12. https://ecommons.cornell.edu/bitstream/handle/1813/67733/Zhang_cornell_005
8O_10695.pdf?sequence=1&isAllowed=y
13. https://sites.insead.edu/facultyresearch/research/file.cfm?fid=64810
14. https://www.researchgate.net/publication/336330683_A_Model_for_Business_
Success_Prediction_using_Machine_Learning_Algorithms
15. https://www.researchgate.net/publication/331606166_Intelligent_Sales_Predicti
on_Using_Machine_Learning_Techniques
16. https://www.irjet.net/archives/V7/i6/IRJET-V7I6676.pdf
17. http://www.diva-portal.org/smash/get/diva2:1366957/FULLTEXT02
18. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=r
ja&uact=8&ved=2ahUKEwi76--QlMPvAhUA8HMBHd-
hD044ChAWMAF6BAgCEAM&url=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttps%2Fwww.mdpi.com%2F2
306-
5729%2F4%2F1%2F15%2Fpdf&usg=AOvVaw3m4CpJP8jV2uonKfuuAJEL
19. https://journals.sagepub.com/doi/epub/10.1177/1094670517752459
20. http://www.ijsred.com/volume2/issue2/IJSRED-V2I2P83.pdf
57
ANNEXURES
58
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
Business
0 0 2869 3600 1 2 4 1
Rooms
Business
1 1 4318 3900 1 2 4 9
Rooms
Business
2 2 6265 2700 1 1 4 2
Rooms
59
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
Unnamed:
customer_id pay daysSpent bathService RoomType minForService floor
0
Super
3 3 24 4900 1 1 2 3
Deluxe
Business
4 4 9481 3900 1 1 3 4
Rooms
Business
1008 1008 7329 2300 1 1 5 2
Rooms
Business
1009 1009 10286 3750 1 2 1 1
Rooms
Super
1011 1011 1721 4200 1 2 2 5
Deluxe
Executive
1012 1012 1676 18000 2 2 2 5
dataHotel
Out[3]:
Unnamed:
customer_id pay daysSpent bathService RoomType minForService floor
0
0 0 2869 3600 1 2 1 4 1
1 1 4318 3900 1 2 1 4 9
2 2 6265 2700 1 1 1 4 2
3 3 24 4900 1 1 3 2 3
4 4 9481 3900 1 1 1 3 4
dataHotel["Damages"] = dataHotel["Damages"].map(ouiNon)
dataHotel["spa"] = dataHotel["spa"].map(ouiNon)
dataHotel["has_washer_dryer"] = dataHotel["has_washer_dryer"].map(ouiNo
dataHotel["has_personalizedService"] = dataHotel["has_personalizedServi
dataHotel["barservice"] = dataHotel["barservice"].map(ouiNon)
dataHotel["has_dishwasher"] = dataHotel["has_dishwasher"].map(ouiNon)
dataHotel["has_patio"] = dataHotel["has_patio"].map(ouiNon)
dataHotel["has_gym"] = dataHotel["has_gym"].map(ouiNon)
60
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
dataHotel
Out[4]:
Unnamed:
customer_id pay daysSpent bathService RoomType minForService floor
0
0 0 2869 3600 1 2 1 4 1
1 1 4318 3900 1 2 1 4 9
2 2 6265 2700 1 1 1 4 2
3 3 24 4900 1 1 3 2 3
4 4 9481 3900 1 1 1 3 4
0 3600 1 2 1 4 1 1 1
1 3900 1 2 1 4 9 0 1
2 2700 1 1 1 4 2 0 1
3 4900 1 1 3 2 3 0 1
4 3900 1 1 1 3 4 1 1
61
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
In [10]: X = hotelDataMain[["daysSpent",
"bathService",
"RoomType",
"minForService",
"floor",
"Damages",
"spa",
"has_washer_dryer",
"has_personalizedService",
"barservice",
"has_dishwasher",
"has_patio",
"has_gym"]]
y = hotelDataMain["pay"]
correlationMatrix = hotelDataMain.corr()
top_corr_features = correlationMatrix.index
plt.figure(figsize=(20,20))
heatMap = sns.heatmap(hotelDataMain[top_corr_features].corr(),annot=Tru
62
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
plt.scatter(hotelDataMain["floor"], hotelDataMain.pay)
plt.xlabel('Floor')
plt.ylabel('Bill Amount')
plt.show()
63
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
-------------------------------------------------
Linear Regression
In [85]: Xfeatures = hotelDataMain[["daysSpent",
"bathService",
"RoomType",
"has_washer_dryer",
"has_personalizedService",
"barservice",
"has_gym"]]
yFeature = hotelDataMain.pay
hotelRegression = LinearRegression()
hotelRegression.fit(xTrain, yTrain)
predictPrices = hotelRegression.predict(xTest)
predictPrices
Out[85]:
64
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
Predict price
In [81]: yourName = "Anuraag Rath"
daysSpentu = 1
yourbathService = 2
yourroomType = 2
washer_dryer = 1
personalizedService = 0
yourbarservice = 1
gymService = 1
predictMyPrice = hotelRegression.predict(myOptions)
65
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...
66