Anuraag Rath MBA Dissertation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

AN IMPLEMENTATION OF MACHINE LEARNING ON CUSTOMERS

PURCHASE DATASET OF
HOTEL GRAND CENTRAL

AT
HOTEL GRAND CENTRAL
BHUBANESHWAR

Submitted as a part of MBA II year course requirement


BY
ANURAAG RATH
RA1952001040075

Under the guidance of


Dr. R. NANDHINI
Assistant Professor

DEPARTMENT OF MANAGEMENT STUDIES


SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
VADAPALANI, CHENNAI
2021

 
BONAFIDE CERTIFICATE

This is to certify that Anuraag Rath is a Bonafide Student of Department of


Management studies, SRMIST, Vadapalani, Chennai. He is in the II year of Masters
Degree Program in Business Administration (MBA). He has done this project under
my guidance and supervision towards part fulfillment of II year MBA course.

Project Guide: Head of The Department(HOD):


Dr. R. NANDHINI DR. C. PRASEEDA

Date:
Place: Chennai

Signature of the Internal Examiner Signature of the External Examiner

DEPARTMENT SEAL

 
DECLARATION

I, Anuraag Rath (RA1952001040075) Studying in II year MBA program at


Department of Management studies, SRMIST, Vadapalani, Chennai, hereby declare
that this project is an original work of mine and I have not verbatim
copied/duplicated any material from sources like Internet or print media, excepting
some vital company information/statistics, which are provided by company itself.

Signature of the Student

Date:
Place: Chennai

 
ACKNOWLEDGEMENT

I wish to take this opportunity to express my sincere expression of gratitude to


each and everyone who helped me in the completion of this work

I am very much obliged and indebted to my Head of The Department of


Department  of  Management  Studies, Dr. C. PRASEEDA, for her valuable
suggestions, guidance and encouragement to complete this project report
successfully.

My sincere thanks to my guide Assistant Professor of Department   of  


Management   Studies, Dr. R. NANDHINI, for her immense support,
guidance, advice and suggestions which made it possible for me to complete
this Project.

I would like to thank Mr. TARAK MISHRA (Managing Director) for his
guidance, effort and suggestion. I am also thankful to Mr. SUBHRANSU
RATH (Director). Without his help and suggestions, it would have been
impossible for me to complete this project.

I also acknowledge with a deep sense of reverence, my gratitude towards my


parents and members of my family who has always supported me morally as
well as economically.

- Anuraag Rath

 
 
ABSTRACT
The objective of this Project is to implement Machine Learning Algorithms on Customers’
Sales dataset of Hotel Grand Central. In this project, the exploration of Machine Learning
systems have been executed to help predict prices of all the services preferred by the
Customers for the year 2019. This model would help to provide insights on the pricing
methods employed by the Hotel and aide Customers in predicting the bill amount based on
their preferences of services. Chapter I outlines the Introduction about the project, the
objectives, scope, problems, needs, benefits of the study. It also encompasses the Company
and Industry profiles. Section II consists of the Review of Literature, where multiple
previous studies on predictive/classification Machine Learning systems have been
implemented on Hotel datasets. Section III encompasses the research methodology of the
study. Section IV outlines the Data Science implementation and analysis, which is the
Machine Learning process and implementation of the Customers’ sales dataset. Chapter V,
discusses the Findings, Conclusion and the future scope of the study.

 
TABLE OF CONTENTS

CHAPTER CONTENTS PAGE

1 INTRODUCTION 1

2 REVIEW OF LITERATURE 24

3 RESEARCH METHODOLOGY 33

4 DATA SCIENCE IMPLEMENTATION 40


AND INTERPRETATION

5 FINDINGS, SUGGESTIONS AND 53


CONCLUSION

56
BIBLIOGRAPHY

58
ANNEXURES

 
LIST OF CHARTS
Sno CHART PAGE

4.1 THE CORRELATION HEATMAP 47

4.2 CORRELATION VALUES 47

4.3 FLOORS AND BILL AMOUNT CORRELATION 48

4.4 ROOM TYPE AND BILL AMOUNT


CORRELATION 49

 
I
INTRODUCTION

 
1.1 INFORMATION SYSTEMS:
A computer is an inherently diverse tool. Businesses generally use a group of networked
computers to collect, organize, store, and transmit information. This network is also known
as a computer information system. In the field of computer information systems,
professionals work to optimize the application of networked computers in business
environments.
To be effective in this effort, these professionals must learn how to improve business
processes by implementing a computer information system that can accommodate the
specific needs of their organization. For example, if an organization is concerned with the
productivity of its employees, IT professionals could use the existing computer information
system to track and measure relevant metrics. The data from such a system could then be
used to design workplace policies that better promote optimal use of labor hours.

1.2 TYPES OF COMPUTER INFORMATION SYSTEMS:

There are several major categories of computer information systems, each with specific
characteristics that make them unique. Here’s a look at the seven most commonly used
systems.

Transaction Processing Systems

The operations handled by transaction processing systems are usually the straightforward,
day-to-day transactions that businesses conduct. These computerized systems perform
simple functions and record them. As an example, a transaction processing system would
likely be used to control inventory or track payroll.

Office Automation (Enterprise Collaboration) Systems

The primary use of office automation systems is creating, storing, and transmitting data
throughout an organization’s network. This simplifies office tasks by keeping team
members connected and also provides management personnel with more control over the
flow of information within the company. When connected to these systems, users are able
to instantly interact with their colleagues using various forms of communication, such as
voice, email, videoconferencing, file transfers, or instant text messaging.

  2  
 
Management Information Systems

Businesses that collect large volumes of data rely on management information systems to
process that data into usable forms, such as reports and data summaries. These systems are
designed to help organizational managers and supervisors make decisions by providing
them with information about the various activities that occur within the business.

Decision Support Systems

Decision support systems are advanced computer information systems that help
organizational leaders make decisions when the potential outcomes are uncertain.
Computer information systems specialists design these systems to perform complex
(usually mathematical) tasks, such as executing calculations, modeling data, comparing
datasets, and predicting the outcomes of scenarios based on available information.

Executive Information Systems

Executive information systems are specifically designed for use by senior leaders, as they
usually compile a vast array of data regarding the internal and external affairs of an
organization. An executive information system distills massive amounts of detailed data
into structured, comprehensible formats. This helps senior managers stay up to date about
the overall status of their organization, allowing them to make informed strategic and
tactical decisions.

Expert Systems

Expert systems emulate the decision-making ability of a human by using reasoning to learn
facts based on the rules set by the individuals who designed them. These systems are some
of the earliest examples of basic artificial intelligence, and business leaders can use them to
develop solutions to complex problems, even within specialized professional domains, such
as medicine or engineering.

Finance and Accounting Systems

To track their financial data, such as investments, revenue, and tax obligations,
organizations may use an accounting information system. These systems can be used to

  3  
 
perform financial audits and generate accounting reports. This helps finance specialists and
business leaders streamline the processing of compiling or tracking accounting data.

1.3 TRENDS IN COMPUTER INFORMATION SYSTEMS

To fully understand what computer information systems are, it is important to stay up to


date with trends in the industry. Trends in computer information systems change frequently
as IT is constantly evolving and changing.

One of the most recent trends is artificial intelligence and the use of machine learning. With
this, networks and systems have the ability to improve themselves without having a
programmer change anything. Networks have the ability to access, analyze, and data by
themselves.

Another trend, which also points to the future of computer information systems, is the
growing need for cybersecurity. As more businesses and organizations develop an online
presence and more people use the internet, the opportunity for cyber attacks also grows. To
protect the information of professionals working across myriad industries, cybersecurity
becomes critical.

1.4 INDUSTRY BENEFITS AND CHALLENGES

Computer information systems have opened many doors in the public and private sectors.
By allowing organizations to communicate more effectively, these systems stimulate
creative innovation and make collaboration easier than ever before. These systems serve as
the foundation for cloud computing, which allows users to store data and use software that
is not installed on their own computers but instead hosted on a remote server elsewhere.
This allows businesses to immediately boost their efficiency without incurring massive
overhead costs.

Computer information systems give businesses a unique ability to customize the way they
use technology, allowing them to adapt to market factors in real time. The downside to
computer information systems is that they are subject to cyber threats, such as hackers,
malware, and viruses. Depending on the size of these systems, maintaining them may also

  4  
 
be costly on the macro level. Still, the benefits of using a computer information system are
likely to outweigh the costs.

1.5 MACHINE LEARNING:

Machine Learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access data and use it to learn for themselves.

The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better decisions in
the future based on the examples that we provide. The primary aim is to allow the
computers learn automatically without human intervention or assistance and adjust actions
accordingly.

But, using the classic algorithms of machine learning, text is considered as a sequence of
keywords; instead, an approach based on semantic analysis mimics the human ability to
understand the meaning of a text

1.6 TYPES OF LEARNING SYSTEMS:

As humans, we have many different ways we learn things. The way you learned calculus,
for example, is probably not the same way you learned to stack blocks. The way you
learned the alphabet is probably wildly different from the way you learned how to tell if
objects are approaching you or going away from you. The latter you might not even realize
you learned at all!

Similarly, when we think about making programs that can learn, we have to think about
these programs learning in different ways. Two main ways that we can approach machine
learning are Supervised Learning and Unsupervised Learning. Both are useful for
different situations or kinds of data available.

  5  
 
Supervised Learning:

Let’s imagine you’re first learning about different genres in music. Your music teacher
plays you an indie rock song and says “This is an indie rock song”. Then, they play you a
K-pop song and tell you “This is a K-pop song”. Then, they play you a techno track and say
“This is techno”. You go through many examples of these genres.

The next time you’re listening to the radio, and you hear techno, you may think “This is
similar to the 5 techno tracks I heard in class today. This must be techno!”

Even though the teacher didn’t tell you about this techno track, she gave you enough
examples of songs that were techno, so you could recognize more examples of it.

When we explicitly tell a program what we expect the output to be, and let it learn the rules
that produce expected outputs from given inputs, we are performing supervised learning.

A common example of this is image classification. Often, we want to build systems that
will be able to describe a picture. To do this, we normally show a program thousands of
examples of pictures, with labels that describe them. During this process, the program
adjusts its internal parameters. Then, when we show it a new example of a photo with an
unknown description, it should be able to produce a reasonable description of the photo.

When you complete a Captcha and identify the images that have cars, you’re labeling
images! A supervised machine learning algorithm can now use those pictures that you’ve
tagged to make it’s car-image predictor more accurate.

Unsupervised Learning:

Let’s say you are an Indian who has been observing the meals Americans eat. You see
people eating breakfasts, lunches, and snacks. Over the course of a couple weeks, you
surmise that for breakfast people mostly eat foods like:

• Cereals
• Bagels
• Granola bars

  6  
 
Lunch is usually a combination of:

• Some sort of vegetable


• Some sort of protein
• Some sort of grain

Snacks are usually a piece of fruit or a handful of nuts. No one explicitly told you what
kinds of foods go with each meal, but you learned from natural observation and put the
patterns together. In unsupervised learning, we don’t tell the program anything about what
we expect the output to be. The program itself analyzes the data it encounters and tries to
pick out patterns and group the data in meaningful ways.

An example of this includes clustering to create segments in a business’s user population.


In this case, an unsupervised learning algorithm would probably create groups (or clusters)
based on parameters that a human may not even consider.

1.7 MACHINE LEARNING PROCESS:

When people think of Machine Learning, they often think of a program that is taking in
data and generating predictions and insights. The process of performing Machine Learning
often requires many more steps before and after the predictive analytics.

We try to think of the Machine Learning process as:

1. Formulating a Question
2. Finding and Understanding the Data
3. Cleaning the Data and Feature Engineering
4. Choosing a Model
5. Tuning and Evaluating
6. Using the Model and Presenting Results

1. Formulating a Question

Let’s say we are performing machine learning for a high-traffic fast-casual restaurant chain,
and our goal is to improve the customer experience. We can serve this goal in many ways.
When we’re thinking about creating a model, we have to narrow down to one measurable,

  7  
 
specific task. For example, we might say we want to predict the wait times for customers’
food orders within 2 minutes, so that we can give them an accurate time estimate.

2. Finding and Understanding the Data

Arguably the largest chunk of time in any machine learning process is finding the relevant
data to help answer your question, and getting it into the format necessary for performing
predictive analysis.

We know that for supervised learning, we need labeled datasets, or datasets that have clear
labels of what their ground truth is. For an example like the restaurant wait time, this would
mean we would need many examples of past orders, tagged with how long the wait time
was. Maybe the restaurant already tracks this data, but we might need to augment the data
collection with a timer that starts when the customer orders, stops when the customer
receives their food, and records that information.

Creating this system of recording data, as well as gathering enough data to be able to train
our model will take time.

Once we have our data, you want to understand it so that you will know what model to
apply and what the outputs will mean. First, you will want to examine the summary
statistics:

• Calculate means and medians to understand the distribution


• Calculate percentiles
• Find correlations that indicate relationships

You may also want to visualize the data, perhaps using box plots to identify outliers,
histograms to show the basic structure of the data, and scatter plots to examine relationships
between variables.

Let’s say we’re examining the existing distribution of wait times. We see that the overall
average is 6.25 minutes per order. But we also produce this histogram:

  8  
 
We might glean from this that there are two main groups of orders. One group seems to
cluster around 4 minutes, while another, smaller, group seems to cluster around 11 minutes.
We could use this to modify our question and build a model that will classify whether or
not an order will be in this “short” timeframe, or in the “long” timeframe. Is it dependent on
the food that it ordered? The time of day of the order?

Perhaps we just become aware of the bimodality of our data. If our model consistently
predicts a wait time of around 6 or 7 minutes, then we are not taking into account the true
structure of our data.

3. Cleaning the Data and Feature Engineering

Real data is messy! Data may have errors. Some columns may be empty. The features
we’re interested in might require string manipulation to extract. Cleaning the data refers to
the process by which we address missing values and outliers, among other things that may
affect our insights.

We may see that we have a group of orders that took over 20 minutes, due to an emergency
in the kitchen one afternoon. This is pushing our average wait time up, and may skew our
predictions. If we want to model the more general functioning of the restaurant, we may
want to remove these values.

Feature Engineering refers to the process by which we choose the important features (or
columns) to look at, and make the appropriate transformations to prepare our data for our
model.

  9  
 
We might try:

• Normalizing or standardizing the data


• Augmenting the data by adding new columns
• Removing unnecessary columns

After we test our model on the data we have, we might go back and reengineer features to
see if we get a better result.

4. Choosing a Model

Once we understand our dataset and know the problem we are trying to solve, we can begin
to choose a model that will help us tackle our problem.

If we are attempting to find a continuous output, like predicting the number of minutes
someone should wait for their order, we would use a regression algorithm.

If we are attempting to classify an input, like determining if an order will take under 5
minutes or over 10 minutes, then we would use a classification algorithm.

The different classification and regression algorithms work better on different types of
datasets. We use different models on categorical and numerical data, and different models
on datasets with many features and datasets with few features.

5. Tuning and Evaluating

We often want to set a metric of success, so that we know the model we’ve chosen is good
enough. Are we looking for accuracy? Precision? Some combination of the two? We
discuss this in our lesson on Precision and Accuracy.

Each model has a variety of parameters that change how it makes decisions. We can adjust
these and compare the chosen evaluation metrics of the different variants to find the most
accurate model.

For example, let’s say we’re using a K-Nearest Neighbors regression algorithm to solve the
wait time prediction problem. This algorithm uses a parameter k, which you will learn
about in the KNN lesson. We can adjust k to get different results.

  10  
 
Is it ideal to compare against 3 nearest neighbors? 10? 1? We can try many different values
of k and see which one gives us the highest level of accuracy:

From this analysis, we would set our k to be 26, which got the highest level of accuracy.

6. Using the Model and Presenting Results

When we achieve the level of accuracy we want on our training set, you can use the model
on the data you actually care about analyzing.

For our example, we can now start inputting new orders. The input could be an order, with
features like:

• The type of item ordered


• The quantity
• The time of day
• The number of employees working

The output would be how long the order is expected to take. This information could be
displayed to users.

An important step is being able to convey what you’ve learned and created, so that people
can use it in the future.

Sometimes we learn more about our data by looking at the model. For example, using
Multiple Learning Regression can give us insights into the importance of each feature. We
can create a feature importance graph to visualize this for those unfamiliar with our model:

  11  
 
1.8 HOTEL INDUSTRY

1.8.1 INTRODUCTION ABOUT HOTEL INDUSTRY:

The hotel industry is any types or forms of business relating to the provision of
accommodation in lodging, food and drinks and various types of other services that are
interconnected and which are intended for public service, both of which use the lodging
facilities or who simply use the services or the production of certain of the hotel.

Hotels offer enormous range of guests’ services such as banqueting, conference and fitness,
sport and facilities, beauty spas, bars, sophisticated restaurant, casinos, night clubs and
casinos. The Hotel sector consists of more than 15% of all the people who worked in the
hospitality sector. Hotels falls into a number of different categories which includes the
glamorous five-star resort international luxury chains, trendy boutiques, country house,
conference, leisure or guest houses. Many are owner run which offer personalized service
to guests. This very dynamic sector offers good quality accommodation, great variety of
food and beverage, together with other services for all types of customers.

With offering every kind of accommodation catering for every type of taste, the hotel sector
is undeniably constantly growing and evolving, while refining its offering, improving its
experience and creating new products to serve and satisfy customer on a local and global
level. The hotel sector is always striving to offer excellent customer service throughout its
operations.

  12  
 
1.8.2 HISTORY OF THE HOTEL INDUSTRY:

The history of the hospitality industry dates all the way back to the Colonial Period in the
late 1700s. The hotel industry has been the subject of important development and growth
over the years as it has faced World Wars, The Depression and various social changes.
However, the hotel industry as seen today takes form in the early 1950s and 60s, leading
the way for growth in to the dynamic industry. This had led to more and more people
traveling not only for business but also for leisure reasons, leading to the development
which can be seen nowadays.

The idea of renting an accommodation to visitors appeared since ancient times, and the
modern concept of a hotel as we know derives from 1794, when the City Hotel opened in
New York City; the City Hotel was claimed to be the first building designed exclusively
to hotel operations. The City hotel back then possessed 73 rooms and offered different
types of service. Similar operations soon appeared in such nearby cities as Baltimore,
Boston in 1809 and Philadelphia.

The industrial revolution, which started in the 1760s, facilitated the construction of hotels
everywhere, in mainland Europe, in England and in America.

The advent of new ways of transportations, hotels and resorts outside of major cities were
built in the countryside and began promoting their scenery and other attractions. The
concept of the vacation was developed and available to more and more of the population. In
the 1920’s, hotel building entered a boom phase and many famous hotels were opened,

From there a surge of hotels flooded American and the rest of the world with prominent
names such as Radisson, Marriot, Hilton and more others.

1.8.3 GROWTH OF HOTEL INDUSTRY WORLDWIDE:

The rise in levels of income and standard of living but also coupled with an increase
in leisure time has been especially beneficial to the tourism industry. The advent
of technological progress particularly through higher capacity cruise ships and
aircrafts, computerized reservation systems, better road transport facilities have played key
roles in the global growth of hotel industry. Moreover, enhanced productivity have been

  13  
 
favourable to the industry by helping to cut costs and making travel and tourism products
more affordable without ignoring the fact that travel and tourism is now more safe and
secure despite the terrorist attack which threatens the industry.

As competition in the industry increases worldwide, the customers have reap great benefits
in terms of lower prices coupled from a wider choice as the organizations have
to differentiate their products from the crowd to appeal to specific market segments but also
strive to enhance the quality of their services. More and more innovative approaches to
marketing and promotion and creation of new products are pulling the demand to the
destinations. The governments as a facilitator, fund provider and legislator have also had
played its part on the development of the industry. New consumer needs and attitudes have
also fuelled the growth of specific segments for instance ecotourism is booming. One other
factor but not the least is the increased level of economic activity which has led to an
increase in business travel and also the growing trend of international mobility.

Despite global economic challenges, hotel developments continue to progress, with new
rooms injected into global supply by both independent hotels and group.

1.8.4 STATISTICS:

According to the UN World Tourism Organization, the number of international tourist


arrivals worldwide was at only 25 million in 1950 is now set to reach 1 billion in 2012 and
1.8 billion by 2020

In an update of forecasts made at the beginning of the year, the World Travel & Tourism
Council (WTTC) predicts growth for the Travel & Tourism globally of 2.7%, only slightly
downgraded from the 2.8% that was expected for the industry at the beginning of the year.

The main reasons for the adverse trends are that WTTC expects world GDP growth to be
2.3% in 2012; down 0.2% from the beginning of the year.

The trend for Travel & Tourism figures has been positive for the beginning of 2012 and has
surpassed expectations from the start of the year. International tourist arrivals have grown
4.9% in the year from January to June, airline passenger traffic is up 6.8%, and hotel
occupancy rates are up in many markets.

  14  
 
In 2011 Travel & Tourism accounted for 255 million jobs globally generating 9 per cent of
world GDP while generating billions for host economies; explaining why the sector is a key
driver for investment and economic growth.

According to statistics from the World Tourism Organization (WTO), in 2008 an estimated
924 million international tourist arrivals, an increase of 1.76% compared to 2007.
According to statistics from the World Tourism Organization (WTO), in 2008 international
tourist arrivals amounted to 917 million visitors, representing an increase of 1.76%
compared to 2007. In 2009, international tourist arrivals fell to 882 million, representing a
worldwide decline of 4.4% over 2008.

The worldwide destinations recorded a total of 600 million arrivals, International tourist
arrivals in the whole world fell by 7% between January and August 2009, but the rate of
decline has eased in recent months. These results and recent economic data, confirms
UNWTO’s initial forecast a 5% decrease in international tourist arrivals during the year
2009. Specifically, the global tourism in 2011 grew by 4.4 per cent, reaching $980 million
international tourist arrivals. And the forecast for 2012 has just started, UNWTO expects
grow that a somewhat lower rate, but allows to reach 1,000 million international tourists.

1.9 COMPANY PROFILE:

Hotel Grand Central, Bhubaneswar


 

 
  15  
 
1.9.1 ABOUT:

Hotel Grand Central is the ideal accommodation option for business as well as leisure
travelers.

Hotel Grand Central is one of the most easily accessible hotels in Bhubaneswar. Centrally
located just off the Bhubaneswar Railway Station (exit, platform No 6) and 4 Km from the
Airport, it has close proximity to all the business and tourist places of interest at
Bhubaneswar.

This budget hotel offers 31 A/C rooms to stay. The rooms at Hotel Grand Central are
categorized as Deluxe, Super Deluxe and Executive Deluxe Rooms. All the rooms are
lavishly furnished with Satellite LED TV, direct dial telephone, study table, mini bar and
Free Wi-Fi connectivity. The attached washrooms are equipped with bathroom toiletries
and receive a continuous supply of hot & cold water. The Executive Deluxe rooms
additionally have fruit basket, Tea/Coffee Maker and shaving kits available

The multi cuisine restaurant at Hotel Grand Central has a wide choice of mouth watering
Indian, Continental and Chinese cuisine that are bound to get our taste buds tingling with
delight.

Travel Desks care of all our travel related plans either by Air, Train or by Road. It can also
assist in making reservations of accommodations at other destinations. Car Rental facilities
are also available. We also undertake to arrange special sightseeing tours to distant heritage
sights, namely Puri and Konark. Travel desk also offers sightseeing tours to distant heritage
sights, namely Puri and Konark for us / our organisation on request.

One Conference Hall (up to 120 people) and One Board Room (up to 40 people) equipped
with state of the art communication equipments to take care of business & private
Conference & parties

Other facilities includes Free Parking, 24 hours Security & surveillance under CCTV, 24-
hour concierge, in house same day laundry service, Doctor on call, Currency Exchange and
acceptance of major credit cards. The hotel is equipped with Complimentary Wi-Fi Internet
access & CCTV.

  16  
 
1.9.2 FACILITIES:

Travel Desk care of all your travel related plans either by Air, Train or by Road. It can also
assist in making reservations of accommodations at other destinations. Car Rental facilities
are also available. We also undertake to arrange special sightseeing tours to distant heritage
sights, namely Puri and Konark. Travel desk also offers sightseeing tours to distant heritage
sights, namely Puri and Konark for you / your organisation on request.

  17  
 
1.9.3 ROOMS:

Deluxe Rooms:
Buffet Breakfast at Restaurant, Daily 1 ltr. Package drinking water as per occupancy,
Morning Newspaper, Free Wi-Fi Internet (2 gb Daily),Direct Dial phone, Individually
controlled Air Conditioning, Satellite LED TV, Custom made Toiletries, Shower over bath,
24 hrs hot/cold water.

Super Deluxe Rooms:


Services: Buffet Breakfast at Restaurant, Daily 1 ltr. Package drinking water as per
occupancy, Morning Newspaper, Free Wi-Fi Internet (2 gb Daily),Direct Dial phone,
Individually controlled Air Conditioning, Satellite LED TV, Custom made Toiletries,
Shower over bath,24 hrs hot/cold water

Extra Tariff: Tea & coffee-making facilities

 
 

  18  
 
Executive Deluxe Room

Services: Buffet Breakfast at Restaurant, Daily 1 ltr. Package drinking water as per
occupancy, Morning Newspaper, Free Wi-Fi Internet (2 gb Daily), Direct Dial phone,
Individually controlled Air Conditioning, Satellite LED TV, Custom made Toiletries,
Shower over bath, 24 hrs hot/cold water

Extra Tariff: Tea & coffee-making facilities, Fruit Basket, Shaving Kit.

1.9.4 RESTAURANT:

The multi cuisine restaurant at Hotel Grand Central has a wide choice of mouth watering
Indian, Continental and Chinese cuisine that are bound to get your taste buds tingling with
delight.
 

  19  
 
1.9.5 HOTEL POLICY:

• Hotels do not allow unmarried / unrelated couples or guest residing in the same City
to check-in. This is at full discretion of the hotel management. No refund would be
applicable in case the hotel denies check-in under such circumstances.
• The primary guest must be at least 18 years of age to be able to check-in the hotel.
• It is mandatory for guests to present valid photo identification at the time of check-
in. According to government regulations, a valid Photo ID has to be carried by
every person above the age of 18 staying at the hotel. The identification proofs
accepted are Adhar Card, Driving License, Voter ID Card, and Passport. Without
Original copy of valid ID the guest will not be allowed to check-in.

1.9.6 STANDARD OPERATING PROCEDURE(SOP):


 

 
 
Protective Equipment for Guests: Face masks and gloves to be made available on request
for the guests.
Screening of Guests: Temperature check of all guests at the entry point. Any guest with a

temperature above 99.1℉ will be refused admission and may be politely redirected to the

closest medical facility.


Social Distancing Norms: Staff stationed at entrance, reception area, lobby and elevators
etc. to ensure social distancing at all times.
Safe Kitchen Practices: WHO or Govt. approved sanitizing agents used to disinfect and
clean vegetables, meat and all other material. All kitchen supplies to be fully sanitized
before entering the stores and refrigerators.
Guest Travel History Record: Record keeping of recent travel history of all guests as per
government guidelines to aid contact tracing

 
 

  20  
 
Daily Disinfection of Rooms: To be disinfected with WHO- recommended phenolic
disinfectants every day.
Fresh Room Linen: Room linen to be changed once in a Day, or on request.
Soap Dispensers in Rooms: All washrooms to be equipped with liquid soap dispenser(s) or
packed soap bars.
Sanitization of Common Areas: Sanitization of common areas including reception,
elevators and lounge to be done every 6 hours with phenolic disinfectant.

Mandatory Masks & Gloves: House-keeping and service staff to wear masks (3-ply) and
gloves (single-use) at all times, and restaurant staff to wear a mask (3-ply) and hair net at
all times.
Mandatory Staff Training: Staff training to be done at least twice a week on social
distancing, hand hygiene and respiratory etiquette.
Mandatory Temperature Checks: Temperature check for staff twice a day and mandatory
leave for any employee having temperature above 99.1 F.

1.10 OBJECTIVES OF STUDY:


Primary Objectives:
• To predict Customer Sales data to ascertain the prices of the rooms/suites of the
Hotel.
• To aide customers in understanding what are the factors for the cost of the
Rooms/Suites in Hotel Grand Central.
• To collect precise Data to help predict the sales of the Hotel.
• To implement Feature Engineering to select the right Features of the Dataset
• To understand how Machine Learning algorithms like Linear regression work.
• Establishing a standard for frequent usage of Machine Learning to predict, classify
and cluster data.

  21  
 
1.11 PROBLEMS IN STUDY:
• Implementation of the right Supervised Learning Algorithm
• Time Period for conducting the Research is limited
• Less Amount of Training Data
• Irrelevant/Unwanted Features
• The possibility of overfitting the Model
• Unclean Data

1.12 NEED FOR STUDY:


Machine Learning is being implemented by almost every Corporation to either Predict,
Classify or Cluster Data. But it hasn’t been implemented in many Industries like the
Hospitality Industry. Medium sized Companies don’t implement these modern methods to
ascertain as to where they stand. The most basic Machine learning algorithm, Linear
regression is extremely powerful for prediction. With the collected dataset, a Machine
Learning implementation can provide a lot more insights and improve standards of the
Company and help Customers acquire possible Prices for different Rooms/Suites and the
accompanying additional services.

1.13 BENEFITS FOR STUDY:

Machine learning is a booming technology because it benefits every type of business across
every industry. The applications are limitless. From healthcare to financial services,
transportation to cyber security, and marketing to government, machine learning can help
every type of business adapt and move forward in an agile manner.

You might be good at sifting through a massive organized spreadsheet and identifying a
pattern, but thanks to machine learning and artificial intelligence, algorithms can examine
much larger datasets and understand connective patterns even faster than any human, or
any human-created spreadsheet function, ever could. Machine learning allows businesses to
collect insights quickly and efficiently, speeding the time to business value. That’s why
machine learning is important for every organization.

Machine learning also takes the guesswork out of decisions. While you may be able to
make assumptions based on data averages from spreadsheets or databases, machine
  22  
 
learning algorithms can analyze massive volumes of data to provide exhaustive insights
from a comprehensive picture. Put shortly: machine learning allows for higher accuracy
outputs across an ever growing amount of inputs.

1.14 SCOPE FOR STUDY:

Machine Learning enables machines to make data-driven decisions, which is more efficient
than explicitly programming to carry out certain tasks. These algorithms are designed in a
fashion that gives exposure to new data that can help organizations learn and improve their
strategies. Machine Learning can provide various insights and help ask questions, which
were once not imaginable using which new solutions can be generated.

  23  
 
 
 
 
 
 
 
 

II
REVIEW OF LITERATURE
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

  24  
 
2.1 INTRODUCTION:
 
The basic objective of this chapter is to get inside into the previous findings so that it will
help to know the gap in earlier studies and to justify the research problem selected by the
researcher for the study purpose. The literature is reviewed on Machine Learning
implementations for various Hotel Datasets. The prominent areas covered in the present
literature of reviewed are studies related to concept, model, system, functions, Marketing
and Sales, Reviews, HR, recruitment and selection, rewards and recognition and other
issues in the Hotel Industry and how it was covered and solved using Machine Learning
systems.

2.2 REVIEW OF LITERATURE:


TRAINING AND TESTING DATASET:

• William Caicedo-Torres and Fabian Payares in their research “A


Machine Learning Model for Occupancy Rates and Demand
Forecasting in the Hospitality Industry” conclude that a Grid search and
Training/Testing the Dataset was employed to find optimal parameters for
the models. It is worth noticing that models trained on time series plus
additional variables data showed a modest increase in performance,
compared to those trained on time series data only.

• In the paper “Sales Prediction System using Machine Learning”,


Archisha Chandel, Akanksha Dubey, Saurabh Dhawale and Madhuri
Ghuge state that “Experiments have shown that our approach predicts
demand at least as good as single classifiers do, even better using much less
training data (only %20 of the dataset). We think that our approach will
predict much better when more data is used”

AUTOMATED MACHINE LEARNING:

• Ming-Hui Huang and Roland T. Rust in their research “Artificial


Intelligence in Service” develop a theory for understanding the nature of
service work and how/why AI can substitute for or ultimately replace

  25  
 
humans in each type of task/job. They conclude that “AI job replacement
provides a road map about how AI advances to take over tasks requiring
different intelligences, how AI can and should be used to perform service
tasks, and finally how workers can and should shift their skills to achieve a
win–win between humans and machines. We conclude that the advance of
AI in all four intelligences creates opportunities for innovative human–
machine integration for providing service”.

• In Bohdan M. Pavlyshenko’s study, “Machine-Learning Models for


Sales TimeSeries Forecasting” mentions that he has considered different
machine-learning approaches for time series forecasting. The effect of
machine-learning generalization consists in the fact of capturing the patterns
in the whole set of data. This effect can be used to make sales prediction
when there is a small number of historical data for specific sales time series
in the case when a new product or store is launched.”

• Venishetty Sai Vineeth in his research “Machine Learning Approach for


Forecasting the Sales of Truck Components” mentions that After
performing the various statistical tests and performance metrics, it is found
that Ridge Regression is a suitable algorithm in accordance to the chosen
dataset for Sales forecasting.

• Purvika Bajaj, Renesa Ray, Shivani Shedge, Shravani Vidhate and Prof.
Dr. Nikhilkumar Shardoor in their paper “Sales prediction using
machine learning algorithms” mentions that with traditional methods not
being of much help to the business organizations in revenue growth, use of
Machine Learning approaches prove to be an important aspect for shaping
business strategies keeping into consideration the purchase patterns of the
consumers

• Binru Zhang, Yulian Pu, Yuanyuan Wang and Jueyou Li in their study
“Forecasting Hotel Accommodation Demand Based on LSTM Model
Incorporating Internet Search Index” conclude that the research in this

  26  
 
paper has a prominent theoretical significance. An empirical framework
based on web queries was constructed for the ever-growing sample of
tourism data. Secondly, the LSTM deep learning model was introduced for
the first time to forecast the hotel accommodation demands, extending the
application of DL methods in hotel demand forecasting.

FEATURE SELECTION AND ENGINEERING:

• William Caicedo-Torres and Fabian Payares in their research “A


Machine Learning Model for Occupancy Rates and Demand
Forecasting in the Hospitality Industry” mentions that The presence of
additional inputs allowed the models to leverage contextual information and
improve their predictions. Finally, the use of bookings and reservations
known in advance offered the best performance.

• P. Sanjay Bhargav, G. Nagarjuna Reddy, R.V. Ravi Chand, K.Pujitha


and Anjali Mathur in their study “Sentiment Analysis for Hotel Rating
using Machine Learning Algorithm” mentions that the content-based
recommender implies matching of attributes from a user profile in which
preferences and interest are stored with attributes of content object. If a
string, or some morphological variant, is found in both the profile and the
document, a match is made and the document is considered as relevant.

REGRESSION:

• Venishetty Sai Vineeth in his research “Machine Learning Approach for


Forecasting the Sales of Truck Components” mentions that in finding
such solution for sales forecasts, machine learning algorithms such as
Random Forest Regressor, Support Vector Regressor, Ridge Regressor, and
Gradient Boosting Regressor have been evaluated on Volvo truck
components sales data which can forecast the short term sales and help the
organization in making the key decisions.

  27  
 
• In Bohdan M. Pavlyshenko’s study, “Machine-Learning Models for
Sales TimeSeries Forecasting” mentions that The use of regression
approaches for sales forecasting can often give us better results compared to
time series methods. One of the main assumptions of regression methods is
that the patterns in the historical data will be repeated in future.

MACHINE LEARNING FOR BUSINESS DEVELOPMENT:

• In the paper “Sales Prediction System using Machine Learning”,


Archisha Chandel, Akanksha Dubey, Saurabh Dhawale and Madhuri
Ghuge examine the problem of demand forecasting on an e-commerce web
site. “We proposed stacked generalization method consists of sub-level
regressors. We have also tested results of single classifiers separately
together with the general model. Experiments have shown that our approach
predicts demand at least as good as single classifiers do, even better using
much less training data (only 20% of the dataset).”

• The researchers Sunita Cheriyan and Saju Mohanan in their paper


“Intelligent Sales Prediction Using Machine Learning Techniques” have
concluded that an intelligent sales prediction system is required for business
organizations to handle enormous volume of data. Business decisions are
based on speed and accuracy of data processing techniques. Machine
learning approaches highlighted in this research paper will be able to provide
an effective mechanism in data tuning and decision making. In order to be
competent in business, organizations are required to equip with modern
approaches to accommodate different types of customer behavior by
forecasting attractive sales turn over.

• Ibukun Afolabi, T. Cordelia Ifunaya, Funmilayo G. Ojo and Chinonye


Moses in their Research “A Model for Business Success Prediction using
Machine Learning Algorithms” conclude that the prediction of the
performance of business success assist the business owners and
entrepreneurs who are seeking to improve the structure of their business

  28  
 
economic and non-economic model and trying to find where their strength
lies when starting a business using the personality threat.

• Purvika Bajaj, Renesa Ray, Shivani Shedge, Shravani Vidhate and Prof.
Dr. Nikhilkumar Shardoor in their paper “Sales prediction using
machine learning algorithms” conclude that with traditional methods not
being of much help to the business organizations in revenue growth, use of
Machine Learning approaches prove to be an important aspect for shaping
business strategies keeping into consideration the purchase patterns of the
consumers. Prediction of sales with respect to various factors including the
sales of previous years helps businesses adopt suitable strategies for
increasing sales and set their foot undaunted in the competitive world.

HOSPITALITY INDUSTRY:

• Shruthi C G and Gowrishankar S in their research “Machine Learning


Based Comprehensive Analysis of Hospitality Industry in the State of
Karnataka” state that “Our research helps in quick analysis of the best
hotels in different districts of Karnataka. The process started from extraction
of reviews from the website and analyzing in different ways to suggest the
best hotels among the available hotels. We then applied different machine
learning tools and algorithm to our research in order to produce better
results.”

• William Caicedo-Torres and Fabian Payares in their research “A


Machine Learning Model for Occupancy Rates and Demand
Forecasting in the Hospitality Industry” mention that “The results
obtained are promising and support the use of black-box Machine Learning
based tools for estimating hotel occupation, which require little statistical
expertise by the hotel staff; allowing for a more effective employment of
Revenue Management techniques in the hospitality sector.”

  29  
 
• Nuno Antonio, Ana de Almeida and Luis Nunes in their study “An
Automated Machine Learning Based Decision Support System to
Predict Hotel Booking Cancellations” conclude that “The   decrease   in   the  
number   of   actual   cancellations   on   bookings   where   customers   were  
contacted,   a   total   in   excess   of   37   percentage   points,   corresponds   to   a  
relative   cancellation   decrease   of   82%   for   H1   and   83%   for   H2.   These  
findings   indicate   that   the   actions   taken   for   preventing   cancellations   in  
identified   as   cancellable   bookings   amounted   in   a   total   revenue   in   the  
order  of  approximately  €  39,000.00.”

DATA:

• The researchers Sunita Cheriyan and Saju Mohanan in their paper


“Intelligent Sales Prediction Using Machine Learning Techniques” have
concluded that “In our studies, we used almost 85,000 records for the
comparison of algorithms. Since the time of execution was huge and to
manage such a large set of records are complex, some of the records were
discarded, during the analysis phase. The current studies can be expedited by
using Big Data as a tool for the predictive analytics in sales forecasting. The
big data analysis and forecasting are measured as the vital fields in the
modern business scenario.”

• William Caicedo-Torres and Fabian Payares in their research “A


Machine Learning Model for Occupancy Rates and Demand
Forecasting in the Hospitality Industry” mention that Three data sets were
constructed using occupation time series data, occupation times series data
plus additional variables, and reservations data. Grid search was employed to
find optimal parameters for the models. Also, it is worth noticing that
models trained on time series plus additional variables data showed a modest
increase in performance, compared to those trained on time series data only.
The presence of additional inputs allowed the models to leverage contextual
information and improve their predictions. Finally, the use of bookings and
reservations known in advance offered the best performance.

  30  
 
ALGORITHMS/MODELS IMPLEMENTED:

• William Caicedo-Torres and Fabian Payares in their research “A


Machine Learning Model for Occupancy Rates and Demand
Forecasting in the Hospitality Industry” conclude Different models were
trained and validated using Ridge Regression, Kernel Ridge Regression,
Multilayer Perceptron and Radial Basis Function Networks.

• Binru Zhang, Yulian Pu, Yuanyuan Wang and Jueyou Li in their study
“Forecasting Hotel Accommodation Demand Based on LSTM Model
Incorporating Internet Search Index” mention that the LSTM deep
learning model was introduced for the first time to forecast the hotel
accommodation demands, extending the application of DL methods in hotel
demand forecasting.

• Venishetty Sai Vineeth in his research “Machine Learning Approach for


Forecasting the Sales of Truck Components” conclude that in finding
such solution for sales forecasts, machine learning algorithms such as
Random Forest Regressor, Support Vector Regressor, Ridge Regressor, and
Gradient Boosting Regressor have been evaluated on Volvo truck
components sales data which can forecast the short term sales and help the
organization in making the key decisions.

PATTERNS IN FEATURES:

• In Bohdan M. Pavlyshenko’s case study, “Machine-Learning Models for


Sales TimeSeries Forecasting” conclude that The effect of machine-
learning generalization consists in the fact of capturing the patterns in the
whole set of data.

• Binru Zhang, Yulian Pu, Yuanyuan Wang and Jueyou Li in their study
“Forecasting Hotel Accommodation Demand Based on LSTM Model

  31  
 
Incorporating Internet Search Index” mention that Machine Learning
approaches prove to be an important aspect for shaping business strategies
keeping into consideration the purchase patterns of the consumers.

• P. Sanjay Bhargav, G. Nagarjuna Reddy, R.V. Ravi Chand, K.Pujitha


and Anjali Mathur in their study “Sentiment Analysis for Hotel Rating
using Machine Learning Algorithm” mention that The Naïve Bayes
classification algorithm is good for scaling the dataset and implement the
linear equation on features and predicators.

PYTHON IMPLEMENTATION:

• In the paper “Machine Learning in Python: Main Developmentsand


Technology Trends in Data Science, MachineLearning, and Artificial
Intelligence” by Sebastian Raschka, Joshua Patterson and Corey Nolet
mention that, Recent years have also seen an increased interest in
probabilistic programming, Bayesianinference, and statistical modeling in
Python. Notable software in this area includes the PyStan, the Theano-based
PyMC3 library, the TensorFlow-based Edward library, and Pomegranate
which features a user-friendly Scikit-learn-like API.

  32  
 
III
RESEARCH METHODOLOGY

  33  
 
3.1 DATA COLLECTION:
The Data for this project is collected from the Hotel’s Database. The Data collected
consists of prior Customer Sales Data on the purchases of various rooms/suites and
additional services. With this Data a simple Machine Learning algorithm can be
implemented to predict the price of Hotel packages for customers. Using this Machine
Learning Model, Customers would be able to predict the price of the rooms/suites made
available in the Hotel. The Machine Learning Model, which will be implemented, is
Multiple Linear Regression.

3.2 RESEARCH DESIGN:

Linear Regression:

The purpose of machine learning is often to create a model that explains some real-world
data, so that we can predict what may happen next, with different inputs.

The simplest model that we can fit to data is a line. When we are trying to find a line that
fits a set of data best, we are performing Linear Regression.

We often want to find lines to fit data, so that we can predict unknowns. For example:

• The market price of a house vs. the square footage of a house. Can we predict how
much a house will sell for, given its size?
• The tax rate of a country vs. its GDP. Can we predict taxation based on a country’s
GDP?
• The amount of chips left in the bag vs. number of chips taken. Can we predict how
much longer this bag of chips will last, given how much people at this party have
been eating?

Imagine that we had this set of weights plotted against heights of a large set of professional
baseball players:

  34  
 
To create a linear model to explain this data, we might draw this line:

Now, if we wanted to estimate the weight of a player with a height of 73 inches, we could
estimate that it is around 143 pounds.

A line is a rough approximation, but it allows us the ability to explain and predict variables
that have a linear relationship with each other.

Points and Lines:

A line is determined by its slope and its intercept. In other words, for each point y on a line
we can say:

y=mx+b

Where m is the slope, and b is the intercept. y is a given point on the y-axis, and it
corresponds to a given x on the x-axis.

  35  
 
The slope is a measure of how steep the line is, while the intercept is a measure of where
the line hits the y-axis.

When we perform Linear Regression, the goal is to get the “best” m and b for our data.

Gradient Descent for Intercept (b):

As we try to minimize loss, we take each parameter we are changing, and move it as long
as we are decreasing loss. It’s like we are moving down a hill, and stop once we reach the
bottom:

The process by which we do this is called gradient descent. We move in the direction that
decreases our loss the most. Gradient refers to the slope of the curve at any point.

For example, let’s say we are trying to find the intercept for a line. We currently have a
guess of 10 for the intercept. At the point of 10 on the curve, the slope is downward.
Therefore, if we increase the intercept, we should be lowering the loss. So we follow the
gradient downwards.

To find the gradient of loss as intercept changes, the formula comes out to be:

  36  
 
• N is the number of points we have in our dataset
• m is the current gradient guess
• b is the current intercept guess

Gradient Descent for Slope (m):

To find the m gradient, or the way the loss changes as the slope of our line changes, we can
use this formula:

• N is the number of points you have in your dataset


• m is the current gradient guess
• b is the current intercept guess

Learning Rate:

We want our program to be able to iteratively learn what the best m and b values are. So
for each m and b pair that we guess, we want to move them in the direction of the gradients
we’ve calculated. But how far do we move in that direction?

We have to choose a learning rate, which will determine how far down the loss curve we
go.

A small learning rate will take a long time to converge one might run out of time or cycles
before getting an answer. A large learning rate might skip over the best value.

Finding  the  absolute  best  learning  rate  is  not  necessary  for  training  a  model.  One  just  
have   to   find   a   learning   rate   large   enough   that   gradient   descent   converges   with   the  
efficiency  you  need,  and  not  so  large  that  convergence  never  happens.
  37  
 
3.3 IMPLEMENTATION PLAN AND METHODOLOGY:

Data Collection:

The Data will be collected from the databases of Hotel Grand Central. The Data on which
the Data Science implementation will be executed will be the Customers’ purchase dataset.

Data Preprocessing:

The data collected wont be clean. This data is required to be converted and cleaned to make
it more efficient for the implementation of Machine Learning. The Data is required to be
transformed from String datatype to Intergers.

Feature Engineering:

Before the implementation of Machine Learning, the most relevant features have to
selected. The most important features, improves the efficiency and accuracy of the Model.
Feature Engineering is implemented using Pearson’s Correlation between all the features X
against the y variable. From the Correlation heatmap, the best features that correlate with
the y variable are selected for Machine Learning implementation.

ML Implementation using Scikit-learn:


Scikit-learn is a free software machine learning library for the Python programming
language. It features various classification, regression and clustering algorithms including
support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is
designed to interoperate with the Python numerical and scientific libraries NumPy and
SciPy.

  38  
 
This library in Python provides many unsupervised and supervised learning algorithms. It’s
built upon some of the technology you might already be familiar with, like NumPy, pandas,
and Matplotlib.

The functionality that scikit-learn provides include:

• Regression, including Linear and Logistic Regression


• Classification, including K-Nearest Neighbors
• Clustering, including K-Means and K-Means++
• Model selection
• Preprocessing, including Min-Max Normalization

Algorithm to be implemented:

This is a Linear Regression model inside the linear_model module of sklearn

  39  
 
IV
DATA SCIENCE
IMPLEMENTATION
AND INTERPRETATION

  40  
 
4.1 DATA PREPROCESSING:
The dataset is first accessed from the external servers of Hotel Grand Central. Using
FTP (File Transfer Protocol), the data is downloaded from the server. The file “2019
Customer data.sql” file is downloaded.

Accessing the Dataset File

Downloading the File

  41  
 
The platform on which the Data Science process will be implemented is Jupyter Notebook.
We use Python 3.7 for the entire process.
The Python Libraries to be used are:
• Pandas (Data Cleaning and Manipulation)
• Numpy (Data Manipulation)
• Scikit-Learn (Machine Learning Library)
• Matplotlib (Data Visualization)

4.1.1 DATASET:
The Dataset on which we are implementing our Data Science process is contains the
purchase information of Customers.

The Dataset contains data of 1012 customers.

  42  
 
4.1.2 DATA TRANSFORMATION:
The data that has been collected from Hotel Grand Central has perfect information on their
previous customers. The problem is that most of the Data are labeled as Strings. The Data
Science process cannot be implemented on the String data-type.

For example; The Feature, Roomtype, lists the 3 types of rooms which are, Business
rooms, Deluxe suites and Super Deluxe suites. This has to be converted to 0, 1 and 2 to
represent the same. Similar processes have to be carried out for the other Features.

For this implementation we make use of the map function of Python 3. Using Map
function, we can convert the values to Integers using the python3 Dictionary data
structure process.

  43  
 
Before After

The same process will be carried out for the other Features.

A single variable containing a Dictionary of Yes: 1 and No: 0 is created which is used for
all the features that contain additional services.

  44  
 
Before Data Engineering:

After Data Engineering:

Dropping unwanted Columns:


There are columns in this table that are not required for our process.

  45  
 
4.2 FEATURE ENGINEERING:
After the Data Engineering process, The Feature Engineering process is to be implemented.
Selecting the right Features for the Machine Learning is to be executed. For the
Visualization of the Features, we use Data Visualization libraries, Matplotlib and
Seaborn.

Feature Engineering is done by using Correlation. Pandas has a built in function to


perform Correlation. Pearson’s correlation will be used to find the most reliable features.

• Our X variable will contain the Room/Suites prices, all the Services provided by
Hotel Grand Central and other features.
• Our y variable will be the Bill amount.

4.2.1 PERFORMING CORRELATION:

  46  
 
CHART 4.1 - THE CORRELATION HEATMAP:

CHART 4.2 – CORRELATION VALUES:

  47  
 
Interpretation:
From the above generated Heatmap chart, we can observe the level of Correlation between
Y(pay) and X (All services provided and other features). We can see that the Bill amount
has a high Correlation with Room Type(0.76), Days spent(0.63), BathService(0.62), Bar
Service(0.57), Gym Service(0.59), Washer/Dryer(0.63) and Personalized Service(0.59).

CHART 4.3 - FLOORS AND BILL AMOUNT CORRELATION:

Interpretation:
From the above Chart, we can notice that there is almost No Correlation between the Floor
as to where the Rooms/Suites are made available and the Bill Amount. The Bill amount is
affected by the Floor location only on a very few exclusive cases such as the availability of
other Services.

  48  
 
CHART 4.4 - ROOM TYPE AND BILL AMOUNT CORRELATION:

Interpretation:
From the above Scatter plot chart we can observe that there is near perfect Correlation
between the Room Types and the Bill Amount. There are a few Varying cases which occur
because of the preferences by the Customers to purchase additional services.

  49  
 
4.3 MACHINE LEARNING IMPLEMENTATION:

The Data engineering and the Feature Engineering processes have been executed after
which the Machine Learning process can be implemented. Using Scikit-learn, The Linear
regression model can be imported and applied on the best Features selected. Based on the
Feature Engineering process, the best features that were selected were;
• "daysSpent"
• "bathService"
• "RoomType"
• "has_washer_dryer"
• "has_personalizedService"
• "barservice"
• "has_gym"

The data is then split between Training and Testing sets. 80% of the data will be
segregated for Training and the Testing will be implemented on the 20% of the segregated
data.

  50  
 
4.3.1 ACCURACY OF ALGORITHM:

The accuracy of the Model is about 86%. The accuracy could be increased if there were
more Data.

4.3.2 PREDICTING THE PRICE:


Now the Model can be used to predict the Bill amount.

For the feature parameters the entered values are:


• Days Spent = 1 (1 Day)
• Bath Service = 2 (Level 2)
• Room Type = 2 (Deluxe)
• Washer/Dryer Service = 1 (Yes)
• Personalized Service = 0 (No)
• Bar Service = 1 (Yes)
• Gym Service = 1 (Yes)
The Model then generates the Price based on those Parameters. The Predicted price is
Rs. 4508.

  51  
 
4.3.3 CREATING THE APP:
The Customers of Hotel Grand Central could use a full Application that predicts the Prices.
An Initial crude version of the Application has been created. This is the Mark 1 version of
the Application. The Application is created using Python3 and Shell Scripting.

  52  
 
V
FINDINGS, SUGGESTIONS
AND CONCLUSION

  53  
 
5.1 FINDINGS OF THE STUDY:
The findings of this Research are as follows:.
• Implementing a Machine Learning algorithm without Data engineering does not
produce accurate results.
• Without implementing Feature Engineering, the Machine Learning Model’s
accuracy is very low.
• In the Feature Engineering process we found out from the Heatmap that the features
Room Type(0.76), Days spent(0.63), Bath Service(0.62), Bar Service(0.57), Gym
Service(0.59), Washer/Dryer(0.63) and Personalized Service(0.59) have the highest
levels of Correlation with the Bill Amount.
• The Feature, Floor Level had a correlation of just 0.15 with Bill amount because it
depended upon the preferences for additional services of Customers.
• The level of Correlation between Room Type and Bill Amount was the highest
which is 0.76.
• The feature Minutes For Service resulted in a negative correlation with Bill Amount
i.e -0.3.
• The feature Damages Incurred proved to have no correlation with Bill amount i.e
0.012.
• Minutes for service proved to be the worst Feature in the Dataset because it has
negative correlations with every other feature.
• The features Spa, Dishwasher and Patio proved to have no Correlation with the
Bill amount.
• When a Dataset is made available to predict prices using past prices of the
Customers, the Machine Learning model, Multiple Linear Regression proves to be
the best model for implementation.
• It is very important to Train and Test the Dataset.
• The Model provides best results when 80% of the data is segregated for Training
set and 20% of the data for the Testing set.
• Multiple Linear Regression finds the best Slope(m) and Intercept(b) value using
Gradient Descent.
• Having more data would significantly improve the accuracy of the Model.
• With a limited number of Data, i.e 1012, the accuracy of the Model proved to be
86%.
  54  
 
• The Scikit-Learn library of Python 3.7 makes it easy and efficient for the
implementation of Machine Learning models.
• The Machine Learning Model estimates the Bill amount almost perfectly.
• The App makes it easy for the Hotel and the Customers to find the possible prices
of Hotel Grand Central.

5.2 SUGGESTIONS:
• Hotel Grand Central could implement Machine Learning processes on a regular
basis to Classify, Predict and Cluster information using the Data collected from
their customers.
• A similar app can be deployed to understand the Sales of their services.
• Other Machine Learning algorithms like K Nearest Neighbors could be used to
classify and deploy a recommender engine for Customers to select services
preferred by them.
• Data Science process could be implemented on a daily process for Sales, Marketing,
HR and Operations/Production processes.

5.3 CONCLUSION:
The customers’ sales data collected from the Hotel Grand Central has been used to
implement a Linear Regression Machine Learning Model to predict the Price of the
services. The execution of Data Engineering made it possible to clean the data and
replacing values of String data-type into Integers using which the Model performance
improved significantly. Next, Feature Engineering was performed where the best
features were selected using Pearson’s correlation. The Correlation Heatmap showed
that Room Type(0.76), Days spent(0.63), Bath Service(0.62), Bar Service(0.57), Gym
Service(0.59), Washer/Dryer(0.63) and Personalized Service(0.59) are the best features
in the Dataset. These features were then fit into the Model as the X variable and the Bill
amount was fit as the y variable. For the Machine Learning process the dataset was
divided into an 80:20 ratio for Training and Testing the data. The LinearRegression()
class of Scikit-Learn was then implemented to execute the Machine Learning Model.
The Model proved to predict prices with about an 86% accuracy. A new Data was then
entered to predict the price of the Customer. Therefore Multiple Linear regression
algorithm proves to be the best Classification model to predict Prices using past data.

  55  
 
BIBLIOGRAPHY

  56  
 
References:

1. http://www.hotelgrandcentral.com/
2. https://www.codecademy.com/paths/machine-learning/tracks/introduction-to-
machine-learning-skill-path/modules/introduction-to-machine-learning-skill-
path/articles/the-ml-process
3. https://www.codecademy.com/paths/machine-learning/tracks/introduction-to-
machine-learning-skill-path/modules/introduction-to-machine-learning-skill-
path/articles/scikit-learn
4. https://www.codecademy.com/paths/machine-learning/tracks/introduction-to-
machine-learning-skill-path/modules/introduction-to-machine-learning-skill-
path/articles/machine-learning-supervised-vs-unsupervised
5. https://scikit-learn.org/stable/about.html
6. https://online.maryville.edu/online-bachelors-degrees/management-information-
systems/what-is/
7. https://www.ripublication.com/ijaer18/ijaerv13n22_50.pdf
8. https://sites.insead.edu/facultyresearch/research/file.cfm?fid=64810
9. https://www.researchgate.net/publication/334306392_An_Automated_Machine
_Learning_Based_Decision_Support_System_to_Predict_Hotel_Booking_Canc
ellations
10. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=r
ja&uact=8&ved=2ahUKEwi3v8aC2dnvAhVcKysKHdguBYs4ChAWMAF6BA
gCEAM&url=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttps%2Fwww.mdpi.com%2F2071-
1050%2F11%2F17%2F4708%2Fpdf&usg=AOvVaw3rEthDu9cD4DFPyQSbX
Tn8
11. https://www.ijitee.org/wp-content/uploads/papers/v8i6/F3835048619.pdf
12. https://ecommons.cornell.edu/bitstream/handle/1813/67733/Zhang_cornell_005
8O_10695.pdf?sequence=1&isAllowed=y
13. https://sites.insead.edu/facultyresearch/research/file.cfm?fid=64810
14. https://www.researchgate.net/publication/336330683_A_Model_for_Business_
Success_Prediction_using_Machine_Learning_Algorithms
15. https://www.researchgate.net/publication/331606166_Intelligent_Sales_Predicti
on_Using_Machine_Learning_Techniques
16. https://www.irjet.net/archives/V7/i6/IRJET-V7I6676.pdf
17. http://www.diva-portal.org/smash/get/diva2:1366957/FULLTEXT02
18. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=r
ja&uact=8&ved=2ahUKEwi76--QlMPvAhUA8HMBHd-
hD044ChAWMAF6BAgCEAM&url=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttps%2Fwww.mdpi.com%2F2
306-
5729%2F4%2F1%2F15%2Fpdf&usg=AOvVaw3m4CpJP8jV2uonKfuuAJEL
19. https://journals.sagepub.com/doi/epub/10.1177/1094670517752459
20. http://www.ijsred.com/volume2/issue2/IJSRED-V2I2P83.pdf

  57  
 
ANNEXURES

  58  
 
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

Machine Learning implementation


by Anuraag Rath
Hotal Grand Central Customers purchase Dataset
In [93]: from IPython import display
display.Image("./HGC.jpg")
Out[93]:

In [1]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

In [2]: dataHotel = pd.read_csv("DataHotel.csv", skipinitialspace=True)


dataHotel
Out[2]:
Unnamed:
customer_id pay daysSpent bathService RoomType minForService floor
0

Business
0 0 2869 3600 1 2 4 1
Rooms

Business
1 1 4318 3900 1 2 4 9
Rooms
Business
2 2 6265 2700 1 1 4 2
Rooms

59
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

Unnamed:
customer_id pay daysSpent bathService RoomType minForService floor
0

Super
3 3 24 4900 1 1 2 3
Deluxe
Business
4 4 9481 3900 1 1 3 4
Rooms

... ... ... ... ... ... ... ... ...

Business
1008 1008 7329 2300 1 1 5 2
Rooms
Business
1009 1009 10286 3750 1 2 1 1
Rooms

1010 1010 9169 4000 1 1 Deluxe 5 1

Super
1011 1011 1721 4200 1 2 2 5
Deluxe

Executive
1012 1012 1676 18000 2 2 2 5

In [3]: #Mapping the Features


roomType = {"Business Rooms": 1, "Deluxe": 2, "Super Deluxe": 3, "Execu
dataHotel["RoomType"] = dataHotel["RoomType"].map(roomType)
#dataHotel["RoomType"] = dataHotel["RoomType"].astype(int)

dataHotel
Out[3]:
Unnamed:
customer_id pay daysSpent bathService RoomType minForService floor
0

0 0 2869 3600 1 2 1 4 1

1 1 4318 3900 1 2 1 4 9

2 2 6265 2700 1 1 1 4 2

3 3 24 4900 1 1 3 2 3

4 4 9481 3900 1 1 1 3 4

... ... ... ... ... ... ... ... ...

1008 1008 7329 2300 1 1 1 5 2

1009 1009 10286 3750 1 2 1 1 1

1010 1010 9169 4000 1 1 2 5 1

1011 1011 1721 4200 1 2 3 2 5

1012 1012 1676 18000 2 2 4 2 5

1013 rows × 16 columns

In [4]: ouiNon = {"Yes": 1, "No": 0}

dataHotel["Damages"] = dataHotel["Damages"].map(ouiNon)

dataHotel["spa"] = dataHotel["spa"].map(ouiNon)

dataHotel["has_washer_dryer"] = dataHotel["has_washer_dryer"].map(ouiNo

dataHotel["has_personalizedService"] = dataHotel["has_personalizedServi

dataHotel["barservice"] = dataHotel["barservice"].map(ouiNon)

dataHotel["has_dishwasher"] = dataHotel["has_dishwasher"].map(ouiNon)

dataHotel["has_patio"] = dataHotel["has_patio"].map(ouiNon)

dataHotel["has_gym"] = dataHotel["has_gym"].map(ouiNon)

60
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

dataHotel
Out[4]:
Unnamed:
customer_id pay daysSpent bathService RoomType minForService floor
0

0 0 2869 3600 1 2 1 4 1

1 1 4318 3900 1 2 1 4 9

2 2 6265 2700 1 1 1 4 2

3 3 24 4900 1 1 3 2 3

4 4 9481 3900 1 1 1 3 4

... ... ... ... ... ... ... ... ...

1008 1008 7329 2300 1 1 1 5 2

1009 1009 10286 3750 1 2 1 1 1

1010 1010 9169 4000 1 1 2 5 1

1011 1011 1721 4200 1 2 3 2 5

1012 1012 1676 18000 2 2 4 2 5

1013 rows × 16 columns

In [5]: hotelDataMain = dataHotel.drop(["Unnamed: 0", "customer_id"], axis = 1)


hotelDataMain.head()
Out[5]:
pay daysSpent bathService RoomType minForService floor Damages spa has_washer_d

0 3600 1 2 1 4 1 1 1

1 3900 1 2 1 4 9 0 1

2 2700 1 1 1 4 2 0 1

3 4900 1 1 3 2 3 0 1

4 3900 1 1 1 3 4 1 1

61
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

In [10]: X = hotelDataMain[["daysSpent",
"bathService",
"RoomType",
"minForService",
"floor",
"Damages",
"spa",
"has_washer_dryer",
"has_personalizedService",
"barservice",
"has_dishwasher",
"has_patio",
"has_gym"]]

y = hotelDataMain["pay"]

correlationMatrix = hotelDataMain.corr()
top_corr_features = correlationMatrix.index
plt.figure(figsize=(20,20))

heatMap = sns.heatmap(hotelDataMain[top_corr_features].corr(),annot=Tru

In [17]: #Correlation Scatter plots

62
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

plt.scatter(hotelDataMain["floor"], hotelDataMain.pay)
plt.xlabel('Floor')
plt.ylabel('Bill Amount')
plt.show()

In [20]: plt.scatter(hotelDataMain["RoomType"], hotelDataMain.pay)


plt.xlabel("""RoomType
{1:Business,
2:Deluxe,
3:SuperDeluxe,
4:ExecutiveDeluxe}""")
plt.ylabel('Bill Amount')
plt.show()

63
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

In [25]: # Correlation Scatter plots


plt.scatter(hotelDataMain["minForService"], hotelDataMain.pay)
plt.xlabel('minForService')
plt.ylabel('Bill Amount')
plt.show()

Machine Learning Implementation

-------------------------------------------------

Linear Regression
In [85]: Xfeatures = hotelDataMain[["daysSpent",
"bathService",
"RoomType",
"has_washer_dryer",
"has_personalizedService",
"barservice",
"has_gym"]]
yFeature = hotelDataMain.pay

xTrain, xTest, yTrain, yTest = train_test_split(Xfeatures,


yFeature,
train_size = 0.8,
test_size = 0.2, random

hotelRegression = LinearRegression()
hotelRegression.fit(xTrain, yTrain)
predictPrices = hotelRegression.predict(xTest)
predictPrices
Out[85]:

64
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

array([2356.04125745, 3325.78080685, 4337.12716716, 2356.04125745,


2356.04125745, 2356.04125745, 2356.04125745, 3731.30607783,
3325.78080685, 3325.78080685, 3175.52066495, 3325.78080685,
2356.04125745, 3325.78080685, 2356.04125745, 2356.04125745,
2356.04125745, 5849.28962962, 3325.78080685, 3325.78080685,
2356.04125745, 3325.78080685, 2356.04125745, 3325.78080685,
3155.29915627, 3325.78080685, 5093.20839839, 2356.04125745,
2356.04125745, 2356.04125745, 2356.04125745, 4487.38730906,
2356.04125745, 2356.04125745, 5093.20839839, 2356.04125745,
2356.04125745, 3325.78080685, 2356.04125745, 2356.04125745,
3325.78080685, 2356.04125745, 3325.78080685, 2356.04125745,
3175.52066495, 3325.78080685, 3325.78080685, 3325.78080685,
3325.78080685, 3325.78080685, 5498.73366937, 2356.04125745,
2356.04125745, 4687.68312741, 2356.04125745, 2761.56652842,
2356.04125745, 2356.04125745, 3325.78080685, 5093.20839839,
2356.04125745, 2356.04125745, 3731.30607783, 2356.04125745,
3731.30607783, 5093.20839839, 5093.20839839, 2356.04125745,
2761.56652842, 2356.04125745, 2356.04125745, 3731.30607783,
3155.29915627, 4687.68312741, 5093.20839839, 3731.30607783,
2356.04125745, 3325.78080685, 2272.33123244, 2356.04125745,
2356.04125745, 3325.78080685, 3325.78080685, 2356.04125745,
5093.20839839, 3155.29915627, 2356.04125745, 4687.68312741,
4687.68312741, 1950.51598647, 4687.68312741, 5093.20839839,
4687.68312741, 2292.55274113, 4687.68312741, 2356.04125745,
2356.04125745, 2356.04125745, 3325.78080685, 3325.78080685,
3325.78080685, 3325.78080685, 2356.04125745, 4687.68312741,
2356.04125745, 3325.78080685, 5093.20839839, 3325.78080685,
3346.00231554, 3325.78080685, 2356.04125745, 2356.04125745,
2356.04125745, 2356.04125745, 2356.04125745, 2185.55960686,
5093.20839839, 2356.04125745, 2185.55960686, 3911.3803875 ,
2185.55960686, 2356.04125745, 3731.30607783, 3325.78080685,
5093.20839839, 2761.56652842, 3731.30607783, 2356.04125745,
3175.52066495, 2356.04125745, 4687.68312741, 2356.04125745,
6660.34017157, 5849.28962962, 2356.04125745, 2356.04125745,
3325.78080685, 5498.73366937, 2356.04125745, 4687.68312741,
5093.20839839, 5093.20839839, 5093.20839839, 3155.29915627,

In [87]: theScore = hotelRegression.score(xTrain, yTrain)


print("Accuracy of Model:", theScore)
Accuracy of Model: 0.8557101791261511

Predict price
In [81]: yourName = "Anuraag Rath"

daysSpentu = 1
yourbathService = 2
yourroomType = 2
washer_dryer = 1
personalizedService = 0
yourbarservice = 1
gymService = 1

myOptions = [[daysSpentu, yourbathService, yourroomType, washer_dryer,

predictMyPrice = hotelRegression.predict(myOptions)

print("Mr. {name}, your Bill amount is Rs. {price}".format(name = yourN

Mr. Anuraag Rath, your Bill amount is Rs. [4507.60881774]

65
Hotel Grand Central by Anuraag Rath - Jupyter Notebook http://localhost:8888/notebooks/Hotel Grand Central by Anur...

The Machine Learning Model has been created

66

You might also like