Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


Computer Science and Engineering Department
Vision Of University
"To be globally recognized as a Centre of Excellence for Research, Innovation,
Entrepreneurship and disseminating knowledge by providing inspirational learni
ng to produce professional leaders for serving the society”.

Mission Of University
Providing world class infrastructure, renowned academicians and ideal environment for Research, Innovation, Consultancy and Entrepreneurship relevant to the society.
nsultancy and Entrepreneurship relevant to the society.
Offering programs & courses in consonance with National policies for nation building and meeting global challenges.
• Designing Curriculum to match International standards, needs of Industry, civil society and for inculcation of trait
s of Creative Thinking and Critical Analysis as well as Human and Ethical values.
• Ensuring students delight by meeting their aspirations through blended learning, corporate mentoring, professiona
l grooming, flexible curriculum and healthy atmosphere based on co-curricular and extra-curricular activities.
Creating a scientific, transparent and objective examination/evaluation system to ensure an ideal certification.
Establishing strategic relationships with leading National and International corporate and universities for academic as well as research collaborations.
ademic as well as research collaborations.
• Contributing for creation of healthy, vibrant and sustainable society by involving in Institutional Social Res
ponsibility (ISR) activities like rural development, welfare of senior citizens, women empowerment, comm
unity service, health and hygiene awareness and environmental protection

Vision Of Department
• To be recognized as a leading Computer Science and Engineering department th
rough effective teaching practices and excellence in research and innovation for
creating competent professionals with ethics, values and entrepreneurial attitu
de to deliver service to society and to meet the current industry standards at th

global level .

Mission Of Department
• M1: To provide practical knowledge using state-of-the-art technological support
for the experiential learning of our students.
M2: To provide industry recommended curriculum and transparent assessment for quality learning experiences.
for quality learning experiences.
M3: To create global linkages for interdisciplinary collaborative learning.
M4: To nurture advanced learning platform for research and innovation for students' profound future growth.
students’ profound future growth.
M5: To inculcate leadership qualities and strong ethical values through value based education.
based education .
PEO's(Program Educational Objectives)
• jectives )
The statements of PEOs (revised from 2022) are given below:
• PEO 1. Graduates of the Computer Science and Engineering can contribute to the Nation’s g
rowth through their ability to solve diverse and complex computer science & engineering probl
ems across a broad range of application areas.
• PEO 2. Graduates of the Computer Science and Engineering can be successful professionals, d
esigning and implementing Products & Services of global standards in the field of Computer Sci
ence & Engineering, becoming entrepreneurs, pursuing higher studies & research.
• PEO 3. Graduates of the Computer Science and Engineering Program can be able to adapt to
changing scenario of dynamic technology with an ability to solve larger societal problems using
logical and flexible approach in decision making.

Program Outcomes (POs)
• PO1: Engineering knowledge: Apply the knowledge of Mathematics, Science, Engineering funda
mentals and computer science fundamental and strategies which have the solution of complex co
mputer science engineering problems.
• PO2: Problem analysis: Identify, formulate, research literature, and analyze complex computer sc
ience engineering problems reaching substantiated conclusions using first principles of mathemat
ics, natural sciences, and engineering sciences.
• PO3: Design/development of solutions: Design solutions for complex database and software engi
neering problems and design system components or processes that meet the specified needs with
appropriate considerations for the public health and safety, and the cultural, societal, and environ
mental considerations.

Program Outcomes (POs)
• PO4: Conduct investigations of complex problems: Use research-based
knowledge and research methods including design of software engineer
ing & networking based experiments, analysis and Interpretation of data
, and synthesis of the information to provide valid conclusions.
• PO5: Modern tool usage: Create, select, and apply appropriate techniqu
es, resources, and modern Computer science engineering and IT tools in
cluding prediction and modeling to complex database or software engin
eering activities with an understanding of the limitations.
• PO6: The engineer and society: Apply reasoning informed by the conte
xtual knowledge to assess social, health, safety, legal and cultural issues
and the consequent responsibilities relevant to the Professional Comput
er Science & Engineering practice.
Program Outcomes (POs)
• PO7: Environment and sustainability: Understand the impact of the professional
computer science and engineering solutions in social and environmental contexts
, and demonstrate the knowledge of, and need for sustainable development goals
• PO8: Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of computer science engineering practice
• PO9: Individual and team work: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings .


Program Outcomes (POs)
PO10: Communication: Communicate effectively on complex computer science engin
eering activities with the engineering community like CSI society at large, such as, bei
ng able to comprehend and write effective reports and design documentation, make eff
ective presentations, and give and receive clear instructions.
• PO11: Project management and finance: Demonstrate knowledge and understanding of
the computer science engineering and management principles and apply these to one’s
own work, as a member and leader in a team, to manage projects and in multidisciplina
ry environments.
• PO12: Life-long learning: Recognize the need for, and have the preparation and ability
to engage in independent and life- long learning in the broadest context of computer sc
ience engineering changes

Program Specific Outcomes (PSOs)

• A Graduate of Computer Science and Engineering Program will be able:

• PSO1. To acquire proficiency in developing and implementing efficient solutions
using emerging technologies, platforms and Free and Open-Source Software
PSO2. To gain critical understanding of hardware and software tools catering to the contemporary needs of IT industry.
contemporary needs of IT industry.

Course Outcomes
CO-1:Apply the basic concept of Machine
learning and statistics learning to deal with real-
life Problems.

CO-2 To Understand and Implement the Supervised

Machine Learning Algorithm.

CO-3: Apply the basic of python programming to

various problems related to AI &ML

CO-4: Design and implement AI&ML algorithms and models.

and models.

CO-5: Apply various unsupervised machine

learning models and evaluate their

Course Objectives

To provide a To study learning processes:

comprehensive foundation supervised and unsupervised, To understand modern
To make familiar with the domain and to Machine Learning and deterministic and statisticaltechniques and practical trends
fundamentals of Artificial intelligence. Optimization methodology knowledge of Machine learners, of Machine learning.
with applications t. and ensemble learning

Syllabus( Theory)
Unit-1 Contact Hours:15
Concept of AI, history, current status, scope, agents, environments, Problem Formulations, Review of tree and graph structures, State space representation, Sea
rch graph and Search tree.
Problem Solving using Search Algorithms. Uninformed search algorithms: Depth-First Search, Breadth-First Search Informed search algorithms: Heuristics, A* al
gorithm, AO* algorithm, Natural Language Processing
Chapter 1.2
Introduction to Machine Learning scope, limitations and ml model, regression, probability, statistics and linear algebra for machine learning, convex optimizati
on, data visualization, hypothesis function And testing, data distributions, data preprocessing, data augmentation, normalizing data sets

Unit-2 Contact Hours:15

Chapter 2.1
Supervised Learning with Regression and Classification techniques 1: Linear Regression, Multiple Regression, Bias-Variance Dichotomy, Model Validation Appro
aches, Evaluation of the performance of an algorithm: Accuracy, Confusion Matrix Error Rate, Precision, Recall, Specificity, Mean Squared Error, Root Mea
n Squared Error.
Chapter 2.2
Supervised Learning with Regression and Classification techniques 2-:
Logistic Regression, Support Vector Machine (SVM), Naive Bayesian Classifier, K-Nearest Neighbor (KNN), and Decision Tree: Picking the best splitting attribute,
over fitting and under fitting, noisy data and pruning. Ensemble Methods: Random Forest

Syllabus( Theory)
Unit-3 Contact Hours:15

Unsupervised Learning: Clustering, Partitioning Method - K-means, K-medioids, Hierarchical Clustering- Agglomerative and divisive clustering, Evaluation of clu
stering algorithms. Principal Component Analysis (Eigen values, Eigen Vectors, Orthogonality). Association Rules: Association Rule mining, Apriori Algorit
hm, Support and Confidence Parameters, Lift and Leverage. Feature Reduction and Dimensionality Reduction.
Chapter 3.2
Semi-Supervised Learning: Introduction, Assumptions, Working and Real-World Applications. Reinforcement Learning: Introduction, Applications and Examples
, Challenges of applying reinforcement learning, Elements of Reinforcement Learning, reinforcement learning algorithm.

Syllabus Practical(List Of Experim
ents )

CO_PO_SO Mapping

Assessment Model Theory

Method of Teaching Practical

Assessment Model for Lab

• Machine learning techniques are used to automatically find the valuable underlying patt
erns within complex data that we would otherwise struggle to discover.
• The hidden patterns and knowledge about a problem can be used to predict future event
s and perform all kinds of complex decision making.
• Traditionally, software engineering combined human created rules with data to create
answers to a problem.
• Instead, machine learning uses data and answers to discover the rules behind a probl
• To learn the rules governing a phenomenon, machines have to go through a learning pr
ocess, trying different rules and learning from how well they perform.
• Hence, why it’s known as Machine Learning.

1. Virtual Personal Assistants: Amazon Echo and Google Home, Siri, Alexa
2. Predictions while Commuting: Traffic Predictions, Weather Predictions
3. Videos Surveillance
4. Social Media Services: People You May Know, Face Recognition, Similar Pins
5. Email Spam and Malware Filtering
6. Online Customer Support
7. Search Engine Result Refining
8. Product Recommendations
9. Online Fraud Detection

Books and Journals:

Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David-Cambridge University Press 2014
Ben-David-Cambridge University Press 2014
Introduction to machine Learning – the Wikipedia Guide by Osman Omer.
• Kevin P. Murphy, Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine L
earning series) Illustrated Edition, MIT Press, 2012.
Mehryar Mohri. T, Foundations of Machine Learning (Adaptive Computation and Machine Learning series), MIT Press,2012.
ries), MIT Press,2012.
Andreas C. Mueller "Introduction to Machine Learning with Python: A Guide for Data Scientists.", O'Reilly, 2016
Reilly, 2016
The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, Jerome H. Friedman

• Video Link-

• Web Link-

Target Companies

• CommScope India Private Limited (

• Nutanix India Technologies Pvt. Ltd
• Apisero India Pvt. Ltd
• EPIKInDiFi Software & Solutions Private Limited
• WorkIndia (Eloquent Info Solutions Pvt. Ltd.)
• ShareChat (Mohalla Tech Private Limited)
• Salesforce India Pvt Ltd
• BrowserStack Software Pvt. Ltd
• Zscaler Softech India Private Limited
• Daffodil Software
• Saviynt India Private Limited
• GlobalLogic-Hitachi

Target Companies

• Clarivate Analytics
• Avalara Technologies Private Limited
• F5 Networks Innovation Private Limited
• Innovaccer Analytics Pvt Ltd
• Virtusa -Codelite
• Clumio Technologies
• Zscaler Softech India Private Limited

Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)

Chapter-1.1 Lecture-1 DISCOVER . LEARN . EMPOWER

28 Computer Science and Engineering Department
Creating Machine
Intelligence Computer Science and Engineering Department

CO1-Understand the fundamental concepts and
techniques of artificial intelligence and machine

CO2- Apply the basic of python programming to

various problems related to AI


To provide a To study learning processes:

comprehensive foundation supervised and unsupervised, To understand modern
To make familiar with the domain and to Machine Learning and deterministic and statisticaltechniques and practical trends
fundamentals of Artificial intelligence. Optimization methodology knowledge of Machine learners, of Machine learning.
with applications t. and ensemble learning


• Chapter-1

Concept of AI, history, current status, scope, agents, environments, Problem Form
ulations, Review of tree and graph structures, State space representation,

Knowledge Representation
Search Logic

Machine Learning

Expert Systems
NLP Vision Robotics
▪ making computers that think?

▪ the automation of activities we associate with human thinking, like decision ma

king, learning ... ?
▪ the art of creating machines that perform functions that require intelligence whe
n performed by people ?
▪ the study of mental faculties through the use of computational models ?
▪ the study of computations that make it possible to perceive, reason and a
ct ?
▪ a field of study that seeks to explain and emulate intelligent behaviour in t
erms of computational processes ?
▪ a branch of computer science that is concerned with the automation of int
elligent behaviour ?
▪ anything in Computing Science that we don't yet know how to do properly
? (!)
THOUGHT Systems that think Systems that think
like humans rationally

Systems that act Systems that act

BEHAVIOUR like humans rationally

▪ To make computers more useful by lettin
g them take over dangerous or tedious t
asks from human
▪ Understand principles of human intellige

▪ AI system is composed of an agent and its environment. The a
gents act in their environment. The environment may contain othe
r agents.
What are Agent and Environment?
▪ An agent is anything that can perceive its environment through s
ensors and acts upon that environment through effectors.
▪ A human agent has sensory organs such as eyes, ears, nose, to
ngue and skin parallel to the sensors, and other organs such as h
ands, legs, mouth, for effectors.
▪ A robotic agent replaces cameras and infrared range finders for t
he sensors, and various motors and actuators for effectors.
▪ A software agent has encoded bit strings as its programs and act

38 Computer Science and Engineering Department
▪ Agent Terminology
▪ Performance Measure of Agent − It is the criteria, which determines h
ow successful an agent is.
▪ Behavior of Agent − It is the action that agent performs after any given
sequence of percepts.
▪ Percept − It is agent’s perceptual inputs at a given instance.
▪ Percept Sequence − It is the history of all that an agent has perceived
till date.
▪ Agent Function − It is a map from the precept sequence to an action.
Scope of AI (AI Careers)
Fresher's should analyze their competencies and skills and choose a bett
er AI role with the potential for upward mobility. The future scope of Artifi
cial Intelligence continues to grow due to new job roles and advanceme
nts in the AI field. The various roles in an AI career are as follows:
▪ AI Analysts and Developers
▪ AI Engineer and Scientist
▪ AI researcher
▪ AI Algorithms Expert
▪ Robotics specialist
▪ Military and aviation specialist
▪ Maintenance and mechanical engineer
▪ Surgical AI technician
▪ The future of AI
▪ The future of Artificial Intelligence is bright in India, with many organizations opting for AI au
tomation. It is essential to understand the recent developments in AI to find suitable job role
s based on your competencies.
The scope of Artificial Intelligence is limited to domestic and commercial purposes as the me
dical and aviation sectors are also using AI to improve their services. If AI is outperforming
human efforts, then opting for AI automation will reduce costs in the long run for a business.
Automation in operational vehicles has created a buzz in the logistics industry as it is expecte
d that automated trucks/vehicles may soon be used.

Myths –Greek Gaint Bronze warrior ‘Talos’
▪ 1950- Alan Turing Proposed the popular ‘Turing Test’
▪ 1951- The year of ‘Game AI’
▪ 1956- John McCarthy coined the term ‘Artificial Intelligence’ at Dartm
outh Conference
▪ 1959-First AI Laboratory ‘MIT Lab’ established
▪ 1960-First Robot introduced to the general motors assembly line
▪ 1961-First Chatbot ‘Eliza’ invented
▪ 1997-IBM’s Deep Blue beat World Chess Champion
▪ 2005-Robotic car Stanley won DARPA grand challenge
▪ 2011- IBM’s Watson Computer Science and Engineering Department 41
▪ More Computation Power(advanced technology, GP
▪ Big Data(Enormous data generated from all possibl
e sources)
▪ Better Algorithms(new efficient algorithms based on
neural networks, deep learning)
▪ Popularity among Tech Gaints(Universities, Govern
ments, Startups and leading IT companies like Goo
gle, Microsoft, Facebook investing in AI) Computer Science and Engineering Department 42

▪ The term AI was first coined in the year 1956 by
John McCarthy at Dartmouth Conference.
▪ He defined AI as the science and engineering o

f making intelligent machines.

▪ AI can be defined as the theory and developme

nt of computer systems to be able to perform ta

sk that normally require human intelligence. Computer Science and Engineering Department 43

State spac representation Computer Science and Engineering Department

AI-APPLICATION Computer Science and Engineering Department

Some Advantages of Artificial Intelligence

▪ more powerful and more useful computers

▪ new and improved interfaces

▪ solving new problems

▪ better handling of information

▪ relieves information overload

▪ conversion of information into knowledge

▪ increased costs
▪ difficulty with software development - slow and e

▪ few experienced programmers

▪ few practical products have reached the market

as yet.
▪Rich E., Artificial Intelligence, Tata McGraw Hill
▪George F. Luger, Artificial Intelligence: Structur
es and Strategies for Complex Problem Solvin
g, Pearson Education Asia.
▪D.W.Patterson, Introduction to AI and Expert S
ystems, PHI. 49

▪N.J.Nilsson, Principles of Artificial Intelligence,


Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence(CST-21CSH-316)
Prepared by:
Sitaram patel(E13285)


Course Outcomes
CO1-Understand the fundamental concepts and techniques of
artificial intelligence and machine learning.

Design and implement basic AI algorithms and mod

CO2- Apply the basic of python programming to

various problems related to AI

Agenda of the lecture
• Problem Solving using Search Algorithms. Uninformed search algorit
hms: Depth-First Search

Problem Solving Technique By Search
• A ‘search’ refers to the search for a solution in a problem space.
• In general, searching refers to as finding information one needs.
• Searching is the most commonly used technique of problem solving in
artificial intelligence.
• Search proceeds with different types of ‘search control strategies’.
• The searching algorithm helps us to search for solution of a particular

Properties of Search Algorithms in AI
• Completeness: A search algorithm is said to be complete if it guarante
es to return a solution if at least any solution exists for any random in
• Optimality: If a solution found for an algorithm is guaranteed to be th
e best solution (lowest path cost) among all other solutions, then suc
h a solution for is said to be an optimal solution.
• Time Complexity: Time complexity is a measure of time for an algorith
m to complete its task.
• Space Complexity: It is the maximum storage space required at any po
int during the search, as the complexity of the problem.
Type of Search Strategies
• Uninformed search (Also known as blind search) –Uninformed search
algorithms have no additional information on the goal node other tha
n the one provided in the problem definition
• Informed search ( Also known as heuristic search) – Search strategies
know whether one state is more promising than another.

Types of Search Algorithms

The Uninformed Search
Uninformed search algorithms have no additional information on the goal n
ode other than the one provided in the problem definition. The plans to re
ach the goal state from the start state differ only by the order and/or lengt
h of actions. Uninformed search is also called Blind search.
These types of algorithms will have:
▪ A problem graph, containing the start node S and the goal node G.
▪ A strategy, describing the manner in which the graph will be traversed to g
et to G.
▪ A fringe, which is a data structure used to store all the possible states (nod
es) that you can go from the current states.
▪ A tree that results while traversing to the goal node. A solution plan, whi
ch the sequence of nodes from S to G.

Depth First Search
• The DFS algorithm is a recursive algorithm that uses the idea of backtr
acking. It involves exhaustive searches of all the nodes by going ahead
, if possible, else by backtracking.
• Here, the word backtrack means that when you are moving forward a
nd there are no more nodes along the current path, you move backw
ards on the same path to find nodes to traverse. All the nodes will be
visited on the current path till all the unvisited nodes have been trave
rsed after which the next path will be selected.
• This recursive nature of DFS can be implemented using stacks.

DFS Contd..
• The basic idea is as follows:
• Pick a starting node and push all its adjacent nodes into a stack.
• Pop a node from stack to select the next node to visit and push all its
adjacent nodes into a stack.
• Repeat this process until the stack is empty.
However, ensure that the nodes that are visited are marked. This will
prevent you from visiting the same node more than once. If you do n
ot mark the nodes that are visited and you visit the same node more t
han once, you may end up in an infinite loop.

Example 1

NOTE:-Follow the dotted arrows to

determine the sequence.

Example 2

Points to Remember
• Time complexity: T(n)= 1+ n2+ n3 +.........+ nm =O(nm)
Where, m= maximum depth of any node
n= no. of nodes in each level
• Completeness: DFS will provide completeness feature when state spa
ce is finite
• Space complexity: DFS algorithm needs to store only single path from
the root node O (bm)
• Optimality: DFS is not optimal, meaning the number of steps in reachi
ng the solution, or the cost spent in reaching it is high.

Advantages of DFS

• Memory requirement is Linear with respect to Nodes.

• Less time and space complexity rather than BFS.
• Solution can be found out by without much more search.

Disadvantages of DFS
• Not guaranteed that it will give you solution.
• Cut-off depth is smaller so time complexity is more
• Determination of depth until the search has proceeds.

Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence(CST-21CSH-316)
Prepared by:
Sitaram patel(E13285)


Course Outcomes
CO1-Understand the fundamental concepts and techniques of
artificial intelligence and machine learning.

Design and implement basic AI algorithms and mod

CO2- Apply the basic of python programming to

various problems related to AI

The Uninformed Search
Uninformed search algorithms have no additional information on the goal n
ode other than the one provided in the problem definition. The plans to re
ach the goal state from the start state differ only by the order and/or lengt
h of actions. Uninformed search is also called Blind search.
These types of algorithms will have:
▪ A problem graph, containing the start node S and the goal node G.
▪ A strategy, describing the manner in which the graph will be traversed to g
et to G.
▪ A fringe, which is a data structure used to store all the possible states (nod
es) that you can go from the current states.
▪ A tree that results while traversing to the goal node. A solution plan, whi
ch the sequence of nodes from S to G.

Breadth First Search
• In Breadth First Search(BFS), the root node of the graph is expanded first, then all
the successor nodes are expanded and then their successor and so on i.e. the nod
es are expanded level wise starting at root level.
• Breadth-first search (BFS) is an algorithm for traversing or searching tree or graph
data structures. It starts at the tree root (or some arbitrary node of a graph, some
times referred to as a ‘search key’), and explores all of the neighbor nodes at the
present depth prior to moving on to the nodes at the next depth level.
• BFS algorithm starts searching from the root node of the tree and expands all suc
cessor nodes at the current level before moving to nodes of next level.
• The breadth-first search algorithm is an example of a general-graph search algorit
• Breadth-first search implemented using FIFO queue data structure.

Example 1
• Path will traverse will be 0-1-2-3-4-5-6-7
• Source node =0
• Destination node =7

Example 2
• S= source node
• K= destination node
• Path will be according to BFS
S---> A--->B---->C--->D---->G--->H--->

Points to Remember
• Time Complexity: Time Complexity of BFS algorithm can be obtained by the
number of nodes traversed in BFS until the shallowest Node.
T (b) = 1+b2 +b3 +.......+ bd = O (bd )
Where, d= depth of shallowest solution
b = node at every state.
• Space Complexity: Space complexity of BFS algorithm is given by the Memo
ry size of frontier which is O(bd ).
• Completeness: BFS is complete, which means if the shallowest goal node is
at some finite depth, then BFS will find a solution.
• Optimality: BFS is optimal if path cost of all edges of tree/graph is same.

Advantages of BFS
• In this procedure at any way it will find the goal.
• It does not follow a single unfruitful path for a long time. It finds the
minimal solution in case of multiple paths.
• There is nothing like useless path in BFS, since it searches level by lev

Disadvantages of BFS
• BFS consumes large memory space. Its time complexity is more.
• It has long pathways, when all paths to a destination are on approxim
ately the same search depth.

Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence(CST-21CSH-316)
Prepared by:
Sitaram patel(E13285)


Course Outcomes
CO1-Understand the fundamental concepts and techniques of
artificial intelligence and machine learning.

Design and implement basic AI algorithms and m

CO2- Apply the basic of python programming to

various problems related to AI

The Informed Search
• Informed search methods use knowledge about the problem domain and choose promis
ing operators first.
• These heuristic search methods use heuristic functions to evaluate the next state towar
ds the goal state.
For finding a solution, by using the heuristic technique, one should carry out the following
1. Add domain—specific information to select what is the best path to continue searching al
2. Define a heuristic function h(n) that estimates the ‗goodness‘ of a node n. Specifically, h(
n) = estimated cost(or distance) of minimal cost path from n to a goal state.
3. The term, heuristic means ‗serving to aid discovery‘ and is an estimate, based on domai
n specific information that is computable from the current state description of how close
we are to a goal.

Heuristics function:
• Heuristic is a function which is used in Inf
ormed Search, and it finds the most prom
ising path. It takes the current state of the
agent as its input and produces the estim
ation of how close agent is from the goal.
• Heuristic function estimates how close a
state is to the goal. 82

• It is represented by h(n), and it calculates

the cost of an optimal path between the p
Characteristics of heuristic s
• Heuristics are knowledgeearch
about domain, which help search a
nd reasoning in its domain.
• Heuristic search incorporates domain knowledge to improve e
fficiency over blind search.
• Heuristic is a function that, when applied to a state, returns va
lue as estimated merit of state, with respect to goal.
• Heuristic evaluation function estimates likelihood of given stat
e leading to goal state.
• Heuristic search function estimates cost from current state to
goal, presuming function is efficiency

Hill Climbing Algorithm(Gradi
ent Ascent/ Descent Algorith
• Iteratively maximize ―”value” of current state, by replacing it by successor state that ha
s highest value, as long as possible.

Note: minimizing a ―”value” function v(n) is equivalent to maximizing –v(n), thus both n
otions are used interchangeably.
Hill climbing algorithm is a local search algorithm which continuously moves in the direc
tion of increasing elevation/value to find the peak of the mountain or best solution to the
problem. It terminates when it reaches a peak value where no neighbor has a higher val
• In this algorithm, we don't need to maintain and handle the search tree or graph as it onl
y keeps a single current state
• It is also called greedy local search as it only looks to its good immediate neighbor state
and not beyond that.
• A node of hill climbing algorithm has two components which are state and value.
• Hill Climbing is mostly used when a good heuristic is available.

Characteristics of Hill Climbi
• Generate and Test variant: It is variation
of a generate-and-test algorithm which di
scards all states which do not look promis
ing or seem unlikely to lead us to the goal
state. To take such decisions, it uses heur
istics (an evaluation function) which indic
ates how close the current state is to the
goal state. 85

• Greedy approach: Hill-climbing algorith

State Space diagram of Hill C
limbing search Algorithm

Related Terminology
• Local Maximum: Local maximum is a stat
e which is better than its neighbor states,
but there is also another state which is hi
gher than it.
• Global Maximum: Global maximum is the
best possible state of state space landsca
pe. It has the highest value of objective
• Current state: It is a state in a landscape
Simple Hill Climbing
• It only evaluates the neighbor node sta
te at a time and selects the first one w
hich optimizes current cost and set it a
s a current state.
• It only checks it's one successor state, an
d if it finds better than the current state, th
en move else be in the same state. 88
Algorithm of Simple Hill Clim
Step 1: Define the current state as an initial state
Step 2: Loop until the goal state is achieved or no more operators can be applie
d on the current state:
a. Apply an operation to current state and get a new state
b. Compare the new state with the goal
c. Quit if the goal state is achieved
d. Evaluate new state with heuristic function and compare it with the current stat
e. If the newer state is closer to the goal compared to current state, update the c
urrent state
f. Else if not better than the current state, then return to step 2.
Step 3: Exit

Problems associated with Hill Cl
imbing Algorithm
Hill Climbing is a short sighted technique as it evaluates only immediate possibil
ities. So it may end up in few situations from which it can not pick any further
states. Let‘s look at these states and some solutions for them.
1. Local maximum: It‘s a state which is better than all neighbors, but there exis
ts a better state which is far from the current state; if local maximum occurs
within sight of the solution, it is known as ―foothills’
2. Plateau: In this state, all neighboring states have same heuristic values, so it‘
s unclear to choose the next state by making local comparisons
3. Ridge: It‘s an area which is higher than surrounding states, but it cannot be r
eached in a single move; for example, we have four possible directions to ex
plore (N, E, W, S) and an area exists in NE direction

There are few solutions to overcome these
1. We can backtrack to one of the previo
us states and explore other directions
2. We can skip few states and make a jum
p in new directions
3. We can explore several directions
91 to fi
gure out the correct path
**Simulated Annealing **
• A hill-climbing algorithm which never makes a move towards a lower value g
uaranteed to be incomplete because it can get stuck on a local maximum. A
nd if algorithm applies a random walk, by moving a successor, then it may c
omplete but not efficient. Simulated Annealing is an algorithm which yields b
oth efficiency and completeness.
• In mechanical term Annealing is a process of hardening a metal or glass to a
high temperature then cooling gradually, so this allows the metal to reach a l
ow-energy crystalline state. The same process is used in simulated annealin
g in which the algorithm picks a random move, instead of picking the best m
ove. If the random move improves the state, then it follows the same path. O
therwise, the algorithm follows the path which has a probability of less than
1 or it moves downhill and chooses another path.

Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence(CST-21CSH-316)
Prepared by:
Sitaram patel(E13285)


Course Outcomes
CO1-Understand the fundamental concepts and techniques of
artificial intelligence and machine learning.

Design and implement basic AI algorithms and m

CO2- Apply the basic of python programming to

various problems related to AI

A* Algorithm
• A* is based on using heuristic methods to
achieve optimality and completeness, an
d is a variant of the best-first algorithm.
• When a search algorithm has the propert
y of optimality, it means it is guaranteed t
o find the best possible solution, in our ca
se the shortest path to the finish97 state. W
hen a search algorithm has the property o
f completeness, it means that if a solution
A* Algorithm Contd.
• Each time A* enters a state, it calculates the cost, f(n) (n being the
neighboring node), to travel to all of the neighboring nodes, and the
n enters the node with the lowest value of f(n).
f (n) = g (n) + h (n)
f(n) = Estimated cost of cheapest solution
g(n) = Cost to reach node n from initial state
h(n) = Cost to reach from node to goal node
• Typically, the A* algorithm is typically used for graphs and graph tra
versals. In terms of graphs, A* is used for finding the shortest path t
o a certain point from a given point.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return fail
ure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of eval
uation function (g+h), if node n is goal node then return success and stop, ot
Step 4: Expand node n and generate all of its successors, and put n into the clo
sed list. For each successor n', check whether n' is already in the OPEN or
CLOSED list, if not then compute evaluation function for n' and place into O
pen list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attac
hed to the back pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.

Points to Remember
• Completeness: A* algorithm is complete as long as:
a. Branching factor is finite.
b. Cost at every action is fixed.
• Optimal It is optimal when it follows two following condition
a. Admissible
b. Consistency
• Time Complexity :O(b^d)
b = branching factor
d = depth of solution
• Space Complexity :O(b^d)

• It is complete and optimal.
• It is the best one from other techniques.
• It is used to solve very complex problems
• SIt is optimally efficient, i.e. there is no ot
her optimal algorithm guaranteed to expa
nd fewer nodes than A*. 101
• This algorithm is complete if the branchin
g factor is finite and every action has fixe
d cost.
• The main drawback of A* is memory requi
rement as it keeps all generated nodes in
the memory, so it is not practical for vario
us large-scale problems. 102
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence(CST-21CSH-316)
Prepared by:
Sitaram patel(E13285)


Course Outcomes
CO1-Understand the fundamental concepts and techniques of
artificial intelligence and machine learning.

Design and implement basic AI algorithms and m

CO2- Apply the basic of python programming to

various problems related to AI

AO* (AND-OR) Algorithm
• When a problem can be divided into a set
of sub problems, where each sub proble
m can be solved separately and a combin
ation of these will be a solution, AND-OR
graphs or AND - OR trees are used for re
presenting the solution.
• The decomposition of the problem 107
or pro
blem reduction generates AND arcs. One
AND are may point to any number of suc

• In an AND-OR graph AO* algorithm is an efficient method to explor
e a solution path.
• AO* algorithm works mainly based on two phases. First phase will f
ind a heuristic value for nodes and arcs in a particular level. The ch
anges in the values of nodes will be propagated back in the next ph
• In order to find solution in an AND-OR graph AO* algorithm works
well similar to best first search with an ability to handle the AND arc
appropriately. The algorithm finds an optimal path from initial node
by propagating the results like solution and change in heuristic valu
e to the ancestors as in algorithm.


In figure (a)
• The top node A has been expanded producing two area one leadin
g to B and leading to C-D
• The numbers at each node represent the value of f ' at that node (c
ost of getting to the goal state from current state). For simplicity, it i
s assumed that every operation (i.e. applying a rule) has unit cost, i
.e., each are with single successor will have a cost of 1 and each of
its components.
• With the available information till now , it appears that C is the most
promising node to expand since its f ' = 3 , the lowest but going thro
ugh B would be better since to use C we must also use D' and the
cost would be 9(3+4+1+1). Through B it would be 6(5+1).

Bachelor of Engineering (Computer Science
& Engineering)
Artificial Intelligence and Machine
Prepared by:
Sitaram patel(E13285) 114
and Engineering Department
Computer Science
Course Outcomes
CO1-Understand the fundamental concepts
and techniques of artificial intelligence and
machine learning.

CO3-Design and implement basic AI

algorithms and models

• Game playing
– State of the art and resources
– Framework
• Game trees
– Minimax
– Alpha-beta pruning
Why study games?
• Clear criteria for success
• Offer an opportunity to study problems involving {hostile, adver
sarial, competing} agents.
• Historical reasons
• Fun
• Interesting, hard problems which require minimal “initial struct
• Games often define very large search spaces
– chess 35100 nodes in search tree, 1040 legal states
State of the art
• How good are computer game players?
– Chess:
• Deep Blue beat Gary Kasparov in 1997
• Garry Kasparav vs. Deep Junior (Feb 2003): tie!

Kasparov vs. X3D Fritz (November 2003): tie!

– Checkers: Chinook (an AI program with a very large endgame database) is(?) the world cha
– Go: Computer players are decent, at best
– Bridge: “Expert-level” computer players exist (but no world champions yet!)
• Good places to learn more:
• Chinook is the World Man-Machine Checkers Ch
ampion, developed by researchers at the Univers
ity of Alberta.
• It earned this title by competing in human tourna
ments, winning the right to play for the (human)
world championship, and eventually defeating th
e best players in the world.
• Visit to play
a version of Chinook over the Internet.
• The developers have fully analyzed the game of
checkers and have the complete game tree for it.
– Perfect play on both sides results in a tie.
• “One Jump Ahead: Challenging Human Suprema
cy in Checkers” Jonathan Schaeffer, University o
f Alberta (496 pages, Springer. $34.95, 1998).
Ratings of human and computer chess champions
Typical case
• 2-person game
• Players alternate moves
• Zero-sum: one player’s loss is the other’s gain
• Perfect information: both players have acces
s to complete information about the state of the
game. No information is hidden from either pla
• No chance (e.g., using dice) involved
• Examples: Tic-Tac-Toe, Checkers, Chess, Go,
Nim, Othello
• Not: Bridge, Solitaire, Backgammon, ...
How to play a game
• A way to play such a game is to:
– Consider all the legal moves you can make
– Compute the new position resulting from each m
– Evaluate each resulting position and determine
which is best
– Make that move
– Wait for your opponent to move and repeat
• Key problems are:
– Representing the “board”
– Generating all legal next boards
– Evaluating a position
Evaluation function
• Evaluation function or static
evaluator is used to evaluate
the “goodness” of a game pos
– Contrast with heuristic search
where the evaluation function w
as a non-negative estimate of t
he cost from the start node to a
goal and passing through the gi
ven node
Evaluation function examples
• Example of an evaluation funct
ion for Tic-Tac-Toe:
f(n) = [# of 3-lengths open for me]
- [# of 3-lengths open for you]
where a 3-length is a complete ro
w, column, or diagonal
• Alan Turing’s function for ches
– f(n) = w(n)/b(n) where w(n) = su
m of the point value of white’s pi
Game trees
• Problem spaces for typical games are
represented as trees
• Root node represents the current
board configuration; player must decide
the best single move to make next
• Static evaluator function rates a board
position. f(board) = real number with
f>0 “white” (me), f<0 for black (you)
• Arcs represent the possible legal moves for a player
• If it is my turn to move, then the root is labeled a "MAX" node
; otherwise it is labeled a "MIN" node, indicating my opponen
t's turn.
• Each level of the tree has nodes that are all MAX or all MIN; n
odes at level i are of the opposite kind from those at level i+1
Minimax procedure
• Create start node as a MAX n
ode with current board config
• Expand nodes down to some
depth (a.k.a. ply) of lookahea
d in the game
• Apply the evaluation function
at each of the leaf nodes
• “Back up” values for each of t
he non-leaf nodes until a valu
Minimax Algorithm

2 1 2 1

2 7 1 8 2 7 1 8 2 7 1 8

This is the move 2

Static evaluator selected by minimax
2 1
MIN 2 7 1 8
Partial Game Tree for Tic-Tac-Toe

• f(n) = +1 if the position is a

win for X.
• f(n) = -1 if the position is a
win for O.
• f(n) = 0 if the position is a
Minimax Tree
MAX node

MIN node

value computed
f value by minimax
Alpha-beta pruning
• We can improve on the performance of th
e minimax algorithm through alpha-beta
• Basic idea: >=2
“If you have
• We an
don’t idea that is su
need to compute
the value at this node.
MIN bad,
=2 don't <=1
take the time to see how tr
• No matter what it is, it can’t
uly awful it is.” -- Pat Winston
affect the value of the root
MAX node.
2 7 1 ?
Alpha-beta pruning
• Traverse the search tree in de
pth-first order
• At each MAX node n, alpha(n)
= maximum value found so far
• At each MIN node n, beta(n) =
minimum value found so far
– Note: The alpha values start at -i
nfinity and only increase, while b
eta values start at +infinity and o
nly decrease.
Alpha-beta example

MIN 3 2 - prune 14 1 - prune

3 12 8 2 14 1
Alpha-beta algorithm
function MAX-VALUE (state, α, β)
;; α = best MAX so far; β = best MIN
if TERMINAL-TEST (state) then return UTILITY(state)
v := -∞
for each s in SUCCESSORS (state) do
v := MAX (v, MIN-VALUE (s, α, β))
if v >= β then return v
α := MAX (α, v)
return v

function MIN-VALUE (state, α, β)

if TERMINAL-TEST (state) then return UTILITY(state)
v := ∞
for each s in SUCCESSORS (state) do
v := MIN (v, MAX-VALUE (s, α, β))
if v <= α then return v
β := MIN (β, v)
return v
Effectiveness of alpha-
• Alpha-beta beta
is guaranteed to c
ompute the same value for th
e root node as computed by
minimax, with less or equal co
• Worst case: no pruning, exa
mining bd leaf nodes, where e
ach node has b children and
a d-ply search is performed
• Best case: examine only (2b)
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


137 Computer Science and Engineering Department
Course Outcomes

CO1 -Understand the fundamental concepts and techniques of

artificial intelligence and machine learning.

Course Objectives

To study learning processes:

To provide a comprehensive
supervised and unsupervised,
To understand the history and foundation to Machine To understand modern
deterministic and statistical
development of Machine Learning and Optimization knowledge of Machine techniques and practical
Learning. methodology with
learners, and ensemble trends of Machine learning.
applications t.

AI - Natural Language Processing
• Natural Language Processing (NLP) refers to AI method of communicating with
an intelligent systems using a natural language such as English.
• Processing of Natural Language is required when you want an intelligent syste
m like robot to perform as per your instructions, when you want to hear decisio
n from a dialogue based clinical expert system, etc.
• The field of NLP involves making computers to perform useful tasks with the na
tural languages humans use. The input and output of an NLP system can be −
• Speech
• Written Text

Components of NLP
• There are two components of NLP as given −
• Natural Language Understanding (NLU)
• Understanding involves the following tasks −
• Mapping the given input in natural language into useful representations.Analyzi
ng different aspects of the language.
• Natural Language Generation (NLG)
• It is the process of producing meaningful phrases and sentences in the form of
natural language from some internal representation.
• It involves −
• Text planning − It includes retrieving the relevant content from knowledge bas
• Sentence planning − It includes choosing required words, forming meaningful
phrases, setting tone of the sentence.
• Text Realization − It is mapping sentence plan into sentence structure. 141
Applications of NLP
• There are the following applications of NLP -
• 1. Question Answering
• Question Answering focuses on building systems that automatically answer the
questions asked by humans in a natural language.
• Text and speech processing: This includes Speech recognition, text-&-speec
h processing, encoding(i.e converting speech or text to machine-readable lang
uage), etc.
• Text classification: This includes Sentiment Analysis in which the machine ca
n analyze the qualities, emotions, and sarcasm from text and also classify it acc
• Language generation: This includes tasks such as machine translation, summ
ary writing, essay writing, etc. which aim to produce coherent and fluent text.
• Language interaction: This includes tasks such as dialogue systems, voice 142 as
sistants, and chatbots, which aim to enable natural communication between hu
mans and computers.
Difficulties in NLU
• NL has an extremely rich form and structure.
• It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beet
le or he lifted a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns. For example,
Rima went to Gauri. She said, “I am tired.” − Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.

Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


146 Computer Science and Engineering Department
Course Outcomes

CO1 -Understand the fundamental concepts and techniques of

artificial intelligence and machine learning.

Course Objectives

To study learning processes:

To provide a comprehensive
supervised and unsupervised,
To understand the history and foundation to Machine To understand modern
deterministic and statistical
development of Machine Learning and Optimization knowledge of Machine techniques and practical
Learning. methodology with
learners, and ensemble trends of Machine learning.
applications t.

Applications of NLP
• There are the following applications of NLP -
• 1. Question Answering
• Question Answering focuses on building systems that automatically answer the
questions asked by humans in a natural language.
• Text and speech processing: This includes Speech recognition, text-&-speec
h processing, encoding(i.e converting speech or text to machine-readable lang
uage), etc.
• Text classification: This includes Sentiment Analysis in which the machine ca
n analyze the qualities, emotions, and sarcasm from text and also classify it acc
• Language generation: This includes tasks such as machine translation, summ
ary writing, essay writing, etc. which aim to produce coherent and fluent text.
• Language interaction: This includes tasks such as dialogue149 systems, voice as
sistants, and chatbots, which aim to enable natural communication between hu
mans and computers.
Phases of NLP
There are the following five phases
of NLP:

Applications of NLP
• 1. Lexical Analysis and Morphological
• The first phase of NLP is the Lexical Analysis. This phase scans the source cod
e as a stream of characters and converts it into meaningful lexemes. It divides t
he whole text into paragraphs, sentences, and words.
• 2. Syntactic Analysis (Parsing)
• Syntactic Analysis is used to check grammar, word arrangements, and shows t
he relationship among the words.
• Example: Agra goes to the Poonam
• In the real world, Agra goes to the Poonam, does not make any sense, so this s
entence is rejected by the Syntactic analyzer.
• 3. Semantic Analysis
• Semantic analysis is concerned with the meaning representation. It mainly focu
ses on the literal meaning of words, phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upo
n the sentences that proceeds it an
d also invokes the meaning of the
sentences that follow it.
5. Pragmatic Analysis
Pragmatic is the fifth and last phas
e of NLP. It helps you to discover t

he intended effect by applying a se

t of rules that characterize coopera
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


155 Computer Science and Engineering Department
Course Outcomes

CO1 -Understand the fundamental concepts and techniques of

artificial intelligence and machine learning.

Course Objectives

To study learning processes:

To provide a comprehensive
supervised and unsupervised,
To understand the history and foundation to Machine To understand modern
deterministic and statistical
development of Machine Learning and Optimization knowledge of Machine techniques and practical
Learning. methodology with
learners, and ensemble trends of Machine learning.
applications t.

Difficulties in NLU
• NL has an extremely rich form and structure.
• It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beet
le or he lifted a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns. For example,
Rima went to Gauri. She said, “I am tired.” − Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.

• Natural Language Processing APIs allow developers to integrate human-to-ma
chine communications and complete several useful tasks such as speech reco
gnition, chatbots, spelling correction, sentiment analysis, etc.
• A list of NLP APIs is given below:
• IBM Watson API
IBM Watson API combines different sophisticated machine learning techniques
to enable developers to classify text into various custom categories. It supports
multiple languages, such as English, French, Spanish, German, Chinese, etc.
With the help of IBM Watson API, you can extract insights from texts, add auto
mation in workflows, enhance search, and understand the sentiment. The main
advantage of this API is that it is very easy to use.
Pricing: Firstly, it offers a free 30 days trial IBM cloud account. You can also op
t for its paid plans.
• . 159
• Chatbot API NLP APIs
Chatbot API allows you to create intelligent chatbots for any service. It supports
Unicode characters, classifies text, multiple languages, etc. It is very easy to us
e. It helps you to create a chatbot for your web applications.
Pricing: Chatbot API is free for 150 requests per month. You can also opt for it
s paid version, which starts from $100 to $5,000 per month.
• Speech to text API
Speech to text API is used to convert speech to text
Pricing: Speech to text API is free for converting 60 minutes per month. Its pai
d version starts form $500 to $1,500 per month.
• Sentiment Analysis API
Sentiment Analysis API is also called as 'opinion mining' which is used to iden
tify the tone of a user (positive, negative, or neutral)
Pricing: Sentiment Analysis API is free for less than 500 requests per month. It
s paid version starts form $19 to $99 per month
• Translation API by SYSTRAN
The Translation API by SYSTRAN is used to translate the text from the source languag
e to the target language. You can use its NLP APIs for language detection, text segme
ntation, named entity recognition, tokenization, and many other tasks.
Pricing: This API is available for free. But for commercial users, you need to use its pa
id version.
• Text Analysis API by AYLIEN
Text Analysis API by AYLIEN is used to derive meaning and insights from the textual c
ontent. It is available for both free as well as paid from$119 per month. It is easy to use
Pricing: This API is available free for 1,000 hits per day. You can also use its paid vers
ion, which starts from $199 to S1, 399 per month.
• Cloud NLP API
The Cloud NLP API is used to improve the capabilities of the application using natural l
anguage processing technology. It allows you to carry various natural language proces
sing functions like sentiment analysis and language detection. It is easy to use.
Pricing: Cloud NLP API is available for free.
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


164 Computer Science and Engineering Department
Course Outcomes

CO1 -Understand the fundamental concepts and techniques of

artificial intelligence and machine learning.

Course Objectives

To study learning processes:

To provide a comprehensive
supervised and unsupervised,
To understand the history and foundation to Machine To understand modern
deterministic and statistical
development of Machine Learning and Optimization knowledge of Machine techniques and practical
Learning. methodology with
learners, and ensemble trends of Machine learning.
applications t.


• Introduction to Machine Learning scope, limitations and, regression, probabil

ity, statistics and linear algebra for machine learning

• Machine Learning Introduction
• History

• Need

• Applications

• Scop
• Advantages
• Limitations
Machine Learning Introduction
• Machine learning is the science of getting computers to realize a task without being
explicitly programmed.
• In other words, the big difference between classical and machine learning algorithms lies
in the way we define them.
• Classical algorithms are given exact and complete rules to complete a task.
• Machine learning algorithms are given general guidelines that define the model, along
with data.
• This data should contain the missing information necessary for the model to complete the
• So, a machine learning algorithm can accomplish its task when the model has been adjust
ed with respect to the data.
• We say that we “fit the model on the data” or that “the model has to be trained on the
• Computers are used in almost all disciplines that include science, technology and medical
science etc.
• Computing techniques are used to find exact solutions of scientific problems. The solutio
ns are attempted on the basis of two valued logic and classical mathematics.
• However, all real life problems cannot be handled by conventional methods.
• Zadeh, who is known as the father of Fuzzy Logic has mentioned that humans are able to
resolve tasks of high complexity without measurement or computation.
• Hence the need arose for developing systems that work on Artificial Intelligence .
• Artificial Intelligence (AI) is the field that studies how machine can be made to act intelli
gently.The term "Artificial Intelligence" was coined by John McCarthy in 1956.
• Machine learning grew out of the quest for artificial intelligence when some researchers
were interested in having machines learn from data.The term machine learning was coine
d in 1959 by Arthur Samuel.

• AI became a field of research to build models and systems that act intelligently without h
uman intervention.
• In mid 1980s Zadeh focused on building systems or making computers think like humans
• For this purpose, the Machines’ ability to compute with numbers which is named as hard
computing has to be supplemented by an additional ability more similar to human thinkin
g which is named as soft computing.
• Soft computing is a term coined by Zadeh, combing a collection of computing techniques
, spanning many fields that fall under various categories in computational intelligence.
• Soft computing has three main branches, Fuzzy system, Evolutionary computation, Artifi
cial Neural computing which sub-sums Machine Learning.
• The guiding principle of soft computing is “Exploit the tolerance for imprecision, uncerta
inty and partial truth to achieve tractability, robustness and low solution cost”.
• Machine learning techniques are used to automatically find the valuable underlying patter
ns within complex data that we would otherwise struggle to discover.
• The hidden patterns and knowledge about a problem can be used to predict future events
and perform all kinds of complex decision making.
• Traditionally, software engineering combined human created rules with data to create an
swers to a problem.
• Instead, machine learning uses data and answers to discover the rules behind a proble
• To learn the rules governing a phenomenon, machines have to go through a learning pro
cess, trying different rules and learning from how well they perform.
• Hence, why it’s known as Machine Learning.

1. Virtual Personal Assistants: Amazon Echo and Google Home, Siri, Alexa
2. Predictions while Commuting: Traffic Predictions, Weather Predictions
3. Videos Surveillance
4. Social Media Services: People You May Know, Face Recognition, Similar Pins
5. Email Spam and Malware Filtering
6. Online Customer Support
7. Search Engine Result Refining
8. Product Recommendations
9. Online Fraud Detection

• Data Analysis: Machine Learning allows us to extract insights, patterns, and tre
nds from large datasets.
• Pattern Recognition: ML models can identify complex patterns and make accur
ate predictions.
• Automation: ML can automate tasks that are repetitive, time-consuming, or req
uire high precision.
• Personalization: ML models can personalize experiences and recommendation
s based on user preferences.
• Decision Making: ML models assist in making data-driven decisions by analyzin
g vast amounts of information.

1. Easily identifies trends and patterns
Machine Learning can review large volumes of data and discover specific trends and patterns that would
not be apparent to humans.
2. No human intervention needed (automation)
With ML, you don’t need to babysit your project every step of the way. Since it means giving machines t
he ability to learn, it lets them make predictions and also improve the algorithms on their own.
3. Continuous Improvement
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This lets them mak
e better decisions.
4. Handling multi-dimensional and multi-variety data
Machine Learning algorithms are good at handling data that are multi-dimensional and multi-variety, and
they can do this in dynamic or uncertain environments.
5. Wide Applications
It holds the capability to help deliver a much more personal experience to customers while also targeting
the right customers.
1. Data Acquisition
Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of g
ood quality. There can also be times where they must wait for new data to be generated.
2. Time and Resources
ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose with a consi
derable amount of accuracy and relevancy. It also needs massive resources to function. This can mean ad
ditional requirements of computer power for you.
3. Interpretation of Results
Another major challenge is the ability to accurately interpret results generated by the algorithms. You mu
st also carefully choose the algorithms for your purpose.
4. High error-susceptibility
Suppose you train an algorithm with data sets small enough to not be inclusive. You end up with biased p
redictions coming from a biased training set. This leads to irrelevant advertisements being displayed to cu
In the case of ML, such blunders can set off a chain of errors that can go undetected for long periods of ti
me. And when they do get noticed, it takes quite some time to recognize the source of the issue, and even
longer to correct it. 176
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


179 Computer Science and Engineering Department
Course Outcomes

CO1 -Understand the fundamental concepts and techniques of

artificial intelligence and machine learning.

Course Objectives

To study learning processes:

To provide a comprehensive
supervised and unsupervised,
To understand the history and foundation to Machine To understand modern
deterministic and statistical
development of Machine Learning and Optimization knowledge of Machine techniques and practical
Learning. methodology with
learners, and ensemble trends of Machine learning.
applications t.


• Regression is a supervised learning technique used for predicting continuous o

r real-valued output based on input features. It aims to establish a relationship
between the independent variables (inputs) and the dependent variable (output
) by fitting a mathematical function or model to the training data.
• .

Key Concepts in Regression
Dependent Variable: The variable being predicted or estimated based on the independen
t variables.Independent Variables: The input features or predictors used to predict the
dependent variable.Training Data: Labeled data consisting of input-output pairs used
to train the regression model.
• Regression Model: The mathematical function or algorithm used to model the relation
ship between the independent and dependent variables.
• Common Types of Regression Models:
• Linear Regression:
• Linear regression assumes a linear relationship between the independent and depen
dent variables.It fits a line (in simple linear regression) or a hyperplane (in multiple lin
ear regression) to the data.The goal is to minimize the sum of squared errors betwee
n the predicted and actual values.
• Polynomial Regression:
• Polynomial regression extends linear regression by introducing 183 polynomial terms of hi
gher degrees. It can capture non-linear relationships between variables by fitting a po
lynomial curve to the data.
• Ridge Regression:
Ridge regression is a regularization technique used to address overfitting in linear
regression.It adds a penalty term to the loss function, which controls the compl
exity of the model and helps prevent large coefficient values.
• Lasso Regression:
• Lasso regression is another regularization technique that adds a penalty term t
o the loss function.It not only reduces overfitting but also performs feature selec
tion by shrinking some coefficients to zero.
• Logistic Regression:
• Despite its name, logistic regression is a classification algorithm often used for
binary classification problems.It models the probability of the outcome using a l
ogistic function and applies a threshold to make predictions
• Support Vector Regression (SVR):
• SVR is a variant of support vector machines (SVM) adapted for regression task
s.It finds a hyperplane that maximizes the margin around the predicted values
while allowing a certain degree of error.
• Decision Tree Regression:
• Decision tree regression builds a tree-like model where each internal node repr
esents a decision based on a feature, and each leaf node represents the predic
ted value.It recursively splits the data based on the values of the features to ma
ke predictions.
• Random Forest Regression:
• Random forest regression is an ensemble method that combines multiple decis
ion tree regressors.
• It averages or combines the predictions from individual trees to obtain the final
• Gradient Boosting Regression:
• Gradient boosting regression is another ensemble technique that builds models in a sequential
manner. It combines weak learners (typically decision trees) and focuses on correcting the mist
akes made by previous models.
• Regression Evaluation Metrics:
• To assess the performance of regression models, several evaluation metrics are commonly use
d, including:
• Mean Squared Error (MSE): Measures the average squared difference between the predicted a
nd actual values.
• Root Mean Squared Error (RMSE): The square root of the MSE, providing a measure of the av
erage prediction error in the original scale of the data.
• Mean Absolute Error (MAE): Measures the average absolute difference between the predicted
and actual values.
• R-squared (Coefficient of Determination): Indicates the proportion of the variance in the depend
ent variable explained by the regression model.
• Adjusted R-squared: Similar to R-squared, but adjusted for the number 186
of predictors in the mod
• Mean Squared Logarithmic Error (MSLE): Measures the average logarithmic difference betwee
n the predicted and actual values, useful for skewed or exponential distributions. Regression
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


189 Computer Science and Engineering Department
Course Outcomes

CO1 -Understand the fundamental concepts and techniques of

artificial intelligence and machine learning.

Course Objectives

To study learning processes:

To provide a comprehensive
supervised and unsupervised,
To understand the history and foundation to Machine To understand modern
deterministic and statistical
development of Machine Learning and Optimization knowledge of Machine techniques and practical
Learning. methodology with
learners, and ensemble trends of Machine learning.
applications t.

Probability, Statistics, and Linear Al
gebra for Machine Learning
• Topic: Probability in Machine Learning
Definition: Probability is the measure of the likelihood of an event occurring.
Importance in ML:
– Probabilistic models are widely used in ML for making predictions and estimating uncertain
– Bayesian inference provides a framework for updating beliefs based on new evidence.
• Probability distributions model uncertainty and variability in data Key Probability Concepts
– Random Variables: Variables whose values depend on the outcome of a random event.
– Probability Distributions: Mathematical functions that describe the likelihood of different out
– Bayes' Theorem: A fundamental principle for updating beliefs based on new evidence.
– Expectation and Variance: Measures of the central tendency and spread of a probability di

Probability, Statistics, and Linear Algebra for Machine Learning
• Statistics in Machine Learning
• Definition: Statistics involves collecting, analyzing, interpreting, presenting, and
organizing data.
• Importance in ML:

– Statistical techniques help us make inferences fr

om data and validate ML models.
– Hypothesis testing assesses the significance of r
elationships between variables.
– Sampling techniques enable efficient data collect
ion and analysis.
Key Statistical Concepts: 193

• Descriptive Statistics: Summarizing and visualizing data using measures such

as mean, median, mode, and standard deviation.
• Inferential Statistics: Making predictions or drawing conclusions about a popula
Probability, Statistics, and Linear Algebra for Machine Learning

• Linear Algebra in Machine Learning

Definition: Linear algebra deals with vectors, matrices, and linear transformations.
• Importance in ML:
ML models often represent data as matrices and utilize linear algebra operations fo
r optimization and analysis.
Linear algebra provides the foundation for understanding and implementing
Various ML algorithms.
• Linear Algebra Concepts
Vectors and Matrices: Representing and manipulating multi-dimensional data struc

– Matrix Operations: Addition, subtraction, multiplic

ation, and transposition.
– Eigen values and Eigenvectors: Important conce194

pts for dimensionality reduction and feature extra

convex optimization, data visualization, hypothesis function And testing, data

• Convex Optimization
• Definition: Convex Optimization refers to the
optimization of convex functions over convex
sets. It involves finding the global minimum o
r maximum of a convex function, subject to a
set of constraints.
• Applications: Parameter estimation, Support
Vector Machines, Linear Regression, Neural
Networks, etc. 195

• Topic: Key Concepts in Convex Optimization

– Convex Sets: Sets that satisfy the condition that
convex optimization, data visualization, hypothesis function And testing, d
ata distributions
Data Visualization
Definition: Data Visualization is the graphical representation of data to uncover patterns, trends, and relationships that a
re not immediately evident in raw data.
Importance: Helps in understanding complex datasets, communicating insights effectively, and aiding decision-making
Techniques: Scatter plots, line graphs, bar charts, histograms, heatmaps, etc.
Topic: Hypothesis Function and Testing
Definition: Hypothesis Function is a function that maps input variables to predicted output values. Hypothesis Testing i
nvolves evaluating the validity of a hypothesis or a claim about a population based on sample data.
Steps in Hypothesis Testing:
Formulate null and alternative hypotheses.
Collect sample data.
Determine a statistical test and significance level.
Calculate test statistics and p-value.
Make a decision based on the p-value and the chosen significance level.

convex optimization, data visualization, hypothesis function And testing, data di
• Topic: Data Distributions
• Definition: Data Distributions describe the possible values and their probabilities in a dataset.
• Common Distributions:
– Normal Distribution (Gaussian)
– Uniform Distribution
– Binomial Distribution
– Exponential Distribution
– Poisson Distribution
– Log-Normal Distribution
• Importance: Understanding data distributions helps in making assumptions, selecting appropriate statistic
al tests, and generating realistic synthetic data.
• Topic: Examples of Data Distributions
• Show visual examples of different data distributions, highlighting their shapes and characteristics.
• Summary: Convex Optimization provides a powerful framework for optimization problems. Data Visuali
zation enables us to gain insights from complex data. Understanding Hypothesis Function and Testing is
essential for making statistical inferences. Knowledge of Data Distributions helps in understanding and a
nalyzing datasets effectively. 197
data preprocessing, data augmentation, normalizing data sets

• Topic: Introduction to Data Preprocessing

• Definition: Data preprocessing is a crucial ste
p in preparing data for machine learning mode
ls by transforming raw data into a clean and st
ructured format.
• Importance: Data preprocessing helps improv
e the quality and reliability of the data, enhanc
es model performance, and mitigates the imp
act of noisy or irrelevant information.
• Topic: Common Data Preprocessing Techniqu
data preprocessing, data augmentation, normalizing data sets

• Data Augmentation
• Definition: Data augmentation is a technique used to increase the size and dive
rsity of the training dataset by applying various transformations to the existing d
• Benefits: Data augmentation helps prevent overfitting, improves model generali
zation, and increases the robustness of the model to variations in the input data
• Examples of Data Augmentation Techniques: Image rotation, flipping, cropping,
zooming, adding noise, etc.
• Topic: Normalizing Data Sets
• Definition: Normalization is the process of scaling numerical features in the dat
aset to a standard range, usually between 0 and 1 or -1 and 1.
• Purpose: Normalization ensures that features with different scales or units contr
ibute equally to the learning process, prevents dominance of certain features, a
nd aids convergence during model training. 199
• Popular Normalization Techniques: Min-Max Scaling, Z-score Standardization,
Decimal Scaling, etc.
data preprocessing, data augmentation, normalizing data sets
• Topic: Data Preprocessing Workflow
• Visual representation of the sequential steps involved in data preprocessing, in
cluding data cleaning, integration, transformation, and dimensionality reduction.
• Topic: Data Augmentation Workflow
• Visual representation of the process of data augmentation, illustrating various tr
ansformations applied to the original data to create augmented samples.
• Topic: Normalization Techniques Comparison
• Comparative analysis of different normalization techniques, highlighting their ad
vantages, disadvantages, and suitable use cases.
• Summary: Data preprocessing, data augmentation, and normalization are esse
ntial techniques in machine learning that contribute to improving model perform
ance, generalization, and robustness.
• Key Takeaways: Preprocessing ensures clean and structured data, augmentati
on increases data diversity, and normalization standardizes feature scales.
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)


203 Computer Science and Engineering Department
Course Outcomes

CO1 -Understand the fundamental concepts and techniques of

artificial intelligence and machine learning.

CO4-Apply various supervised machine learning models and evaluate their


CO5-Apply various unsupervised machine learning models and evaluate their


Course Objectives

To study learning processes:

To provide a comprehensive
supervised and unsupervised,
To understand the history and foundation to Machine To understand modern
deterministic and statistical
development of Machine Learning and Optimization knowledge of Machine techniques and practical
Learning. methodology with
learners, and ensemble trends of Machine learning.
applications t.

ML model

• Supervised Learning: ML models learn from labeled training data to make predi
ctions or classify new examples.
• Unsupervised Learning: ML models find patterns and relationships in unlabeled
data without specific guidance.
• Reinforcement Learning: ML models learn by interacting with an environment a
nd receiving feedback in the form of rewards or penalties.
• Deep Learning: ML models inspired by the structure and function of the human
brain, known as artificial neural networks, capable of processing complex patter

• Dataset: A set of data examples, that contain features important to solving the pro
• Feature: With respect to a dataset, a feature represents an attribute and value com
bination. Colour is an attribute. “Colour is blue” is a feature.
• Model : A data structure that stores a representation of a dataset (weights and bias
es). Models are created/learned when you train an algorithm on a dataset.
• Noise :Any irrelevant information or randomness in a dataset which obscures the u
nderlying pattern.
• Outlier :An observation that deviates significantly from other observations in the
• Test set: A set of observations used at the end of model training and validation to
assess the predictive power of your model. How generalizable is your model to un
seen data?
• Training set : A set of observations used to train machine learning models.
• Data Collection: Collect the data that the algorithm will learn from.

• Data Preparation: Format and arrange the data into the optimal format, extracting impor
tant features and performing dimensionality reduction.

• Training: Also known as the fitting stage, this is where the Machine Learning algorithm
actually learns by showing it the data that has been collected and prepared.

• Evaluation: Test the model to see how well it performs.

• Tuning: Fine tune the model to maximise it’s performance.

• Supervised Learning

• Unsupervised Learning

• Semi-supervised Learning

• Reinforcement Learning

Supervised Learning
• In supervised learning, the goal is to learn the mapping (the rules) between a set of inpu
ts and outputs.
• For example, the inputs could be the weather forecast, and the outputs would be the visito
rs to the beach.
• The goal in supervised learning would be to learn the mapping that describes the relati
onship between temperature with other weather conditions and number of beach visitors.
• So here number of visitors (dependent variable) will be dependent on weather conditions
(independent variable).

• Example labelled data is provided of past input and output pairs during the learning pr
ocess to teach the model how it should behave, hence, called ‘supervised’ learning.
• For the beach example, new inputs can then be fed in of forecast temperature and the Ma
chine learning algorithm will then output a future prediction for the number of visitors.
Supervised Learning
• Being able to adapt to new inputs and make predictions is the crucial generalisation part
of machine learning.
• In training, we want to maximise generalisation, so the supervised model defines the real
‘general’ underlying relationship.
• If the model is over-trained, we cause over-fitting to the examples used and the model w
ould be unable to adapt to new, previously unseen inputs.
• A side effect to be aware of in supervised learning that the supervision we provide introd
uces bias to the learning.
• The model can only be imitating exactly what it was shown, so it is very important to sh
ow it reliable, unbiased examples.
• Also, supervised learning usually requires a lot of data before it learns.
• Obtaining enough reliably labelled data is often the hardest and most expensive part of
using supervised learning. 211
Supervised Learning
• The output from a supervised Machine Learning model could be a category from a finite
set e.g [low, medium, high] for the number of visitors to the beach.
• This is called classification problem.

• The output from a supervised Machine Learning model could be a numeric value from a
finite set e.g [500-2000] for the number of visitors to the beach.
• This is called regression problem.

• Supervised learning is of two types: Classification and Regression.

Supervised Learning-Classificati
• Classification is used to group the similaron
data points into different sections in order to cl
assify them.
• Machine Learning is used to find the rules that explain how to separate the different data
• They all focus on using data and answers to discover rules that linearly separate data
• Linear separability is a key concept in machine learning.
• Classification approaches try to find the best way to separate data points with a line.
• The lines drawn between classes are known as the decision boundaries.
• The entire area that is chosen to define a class is known as the decision surface.
• The decision surface defines that if a data point falls within its boundaries, it will be assi
gned a certain class.
Supervised Learning-Classificati
• Binary Classification on CLASS 1

• Multi-Class Classification
• Multi-Label Classification
• Imbalanced Classification


Binary Classification
• Binary Classification refers to those classification tasks that have two class labels.
• Example: Email spam detection (spam or not).

• Typically, binary classification tasks involve one class that is the normal state and another
class that is the abnormal state.
• For example “not spam” is the normal state and “spam” is the abnormal state.

• Another example is “cancer not detected” is the normal state of a task that involves a me
dical test and “cancer detected” is the abnormal state.

• The class for the normal state is assigned the class label 0 and the class with the abnormal
state is assigned the class label 1.
Multi-Class Classification
• Multi-Class Classification refers to those classification tasks that have more than two cl
ass labels.
• Examples include:
• Face classification.
• Plant species classification.
• Optical character recognition.

• Examples are classified as belonging to one among a range of known classes.

• The number of class labels may be very large on some problems.

• For example, a model may predict a photo as belonging to one among thousands or tens o
f thousands of faces in a face recognition system.
Multi-Label Classification
• Multi-Label Classification refers to those classification tasks that have two or more clas
s labels, where one or more class labels may be predicted for each example.

• Consider the example of photo classification, where a given photo may have multiple obj
ects in the scene and a model may predict the presence of multiple known objects in the p
hoto, such as “bicycle,” “apple,” “person,” etc.

• This is unlike binary classification and multi-class classification, where a single class lab
el is predicted for each example.

Imbalanced Classification
• Imbalanced Classification refers to classification tasks where the number of examples i
n each class is unequally distributed.
• Typically, imbalanced classification tasks are binary classification tasks where the majorit
y of examples in the training dataset belong to the normal class and a minority of exampl
es belong to the abnormal class.

• Examples include:
• Fraud detection.
• Outlier detection.
• Medical diagnostic tests.

Supervised Learning-Regression
• The difference between classification and re CLASS 1

gression is that regression outputs a numbe

r rather than a class.

• Therefore, regression is useful when predict

ing number based problems like stock mark
et prices, the temperature for a given day, or
the probability of an event.

Unsupervised Learning
• In unsupervised learning, only input data is provided in the examples.
• There are no labelled example outputs to aim for.
• But it may be surprising to know that it is still possible to find many interesting and comp
lex patterns hidden within data without any labels.
• An example of unsupervised learning in real life would be sorting different colour coins i
nto separate piles. Nobody taught you how to separate them, but by just looking at their f
eatures such as colour, you can see which colour coins are associated and cluster them in
to their correct groups.

• Unsupervised learning can be harder than supervised learning, as the removal of supervisi
on means the problem has become less defined. The algorithm has a less focused idea of
what patterns to look for.
Unsupervised Learning
• Unsupervised machine learning finds all kind of unknown patterns in data.
• Unsupervised methods help you to find features which can be useful for categorization.
• It is taken place in real time, so all the input data to be analyzed and labeled in the presen
ce of learners.
• It is easier to get unlabeled data from a computer than labeled data, which needs manual i

• Unsupervised Learning is of two types: Clustering and Association.

Unsupervised Learning-Clusteri
• Unsupervised learning is mostly used forng
• Clustering is the act of creating groups with differing characteristics.
• Clustering attempts to find various subgroups within a dataset.
• As this is unsupervised learning, we are not restricted to any set of labels and are free to c
hoose how many clusters to create.
• This is both a blessing and a curse.
• Picking a model that has the correct number of clusters (complexity) has to be conducted
via an empirical model selection process.

Unsupervised Learning-Associat
ionthe rules that describe your data.
• In Association Learning you want to uncover
• For example, if a person watches video A they will likely watch video B.

• Association rules are perfect for examples such as this where you want to find related ite

• Common example is Market Basket Analysis:

• Market Basket Analysis is one of the key techniques used by large retailers to uncover associations bet
ween items. It works by looking for combinations of items that occur together frequently in transaction
s. To put it another way, it allows retailers to identify relationships between the items that people buy.
• Association Rules are widely used to analyze retail basket or transaction data, and are intended to ident
ify strong rules discovered in transaction data using measures of interestingness, based on the concept
of strong rules.
Semi-supervised Learning
• Semi-supervised learning is a mix between supervised and unsupervised approaches.
• The learning process isn’t closely supervised with example outputs for every single input,
but we also don’t let the algorithm do its own thing and provide no form of feedback.
• Semi-supervised learning takes the middle road.
• By being able to mix together a small amount of labelled data with a much larger unl
abeled dataset it reduces the burden of having enough labelled data.
• Therefore, it opens up many more problems to be solved with machine learning.
• Example:
• Internet Content Classification: Labeling each webpage is an impractical and unfeasibl
e process and thus uses Semi-Supervised learning algorithms. Even the Google search alg
orithm uses a variant of Semi-Supervised learning to rank the relevance of a webpage for
a given query.
Reinforcement Learning
• In this approach, occasional positive and negative feedback is used to reinforce behavi
• Think of it like training a dog, good behaviours are rewarded with a treat and become mo
re common. Bad behaviours are punished and become less common.
• This reward-motivated behaviour is key in reinforcement learning.
• It is less common and much more complex, but it has generated incredible results.
• It doesn’t use labels as such, and instead uses rewards to learn.

Reinforcement Learning
• This is very similar to how we as humans also learn.
• Throughout our lives, we receive positive and negative signals and constantly learn from
• The chemicals in our brain are one of many ways we get these signals.
• When something good happens, the neurons in our brains provide a hit of positive neurotr
ansmitters such as dopamine which makes us feel good and we become more likely to re
peat that specific action.
• We don’t need constant supervision to learn like in supervised learning.
• By only giving the occasional reinforcement signals, we still learn very effectively.

Reinforcement Learning
• One of the most exciting parts of Reinforcement Learning is that is a first step away from
training on static datasets, and instead of being able to use dynamic, noisy data-rich envi
• This brings Machine Learning closer to a learning style used by humans. The world is si
mply our noisy, complex data-rich environment.
• Games are very popular in Reinforcement Learning research. They provide ideal data-ric
h environments.
• The scores in games are ideal reward signals to train reward-motivated behaviours. Additi
onally, time can be sped up in a simulated game environment to reduce overall training ti
• A Reinforcement Learning algorithm just aims to maximise its rewards by playing the
game over and over again. If you can frame a problem with a frequent ‘score’ as a rewar
d, it is likely to be suited to Reinforcement Learning.
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence(CST-21CSH-316)
Prepared by:
Sitaram patel(E13285)


Course Outcomes
CO1-Understand the fundamental concepts and
techniques of artificial intelligence and machine learning.

Design and implement basic AI algorithms and mod

CO2- Apply the basic of python programming to

various problems related to AI

Agenda of the lecture
• Problem Solving using Search Algorith
ms. Uninformed search algorithms: Dept
h-First Search, Breadth-First Search Infor
med search algorithms: Heuristics, A* alg
orithm, AO* algorithm

Problem Solving Technique By
• A ‘search’ refers to the search for a solution in a problem sp
• In general, searching refers to as finding information one ne
• Searching is the most commonly used technique of problem
solving in artificial intelligence.
• Search proceeds with different types of ‘search control strate
• The searching algorithm helps us to search for solution of a
particular problem.

Properties of Search Algorith
ms in AI
• Completeness: A search algorithm is said
to be complete if it guarantees to return a
solution if at least any solution exists for a
ny random input.
• Optimality: If a solution found for an algori
thm is guaranteed to be the best solution
(lowest path cost) among all other solutio
ns, then such a solution for is said to be a
n optimal solution.
Type of Search Strategies
• Uninformed search (Also known as blind sea
rch) –Uninformed search algorithms have no
additional information on the goal node other
than the one provided in the problem definiti
• Informed search ( Also known as heuristic se
arch) – Search strategies know whether one
state is more promising than another.
Types of Search Algorithms

UnInformed Informed
Search Search

Breadth AO*
First Search Depth First Simple Hill A* Algorithm
Search Climbing Algorithm

The Uninformed Search
Uninformed search algorithms have no additional information on the goal node
other than the one provided in the problem definition. The plans to reach the
goal state from the start state differ only by the order and/or length of actions
. Uninformed search is also called Blind search.
These types of algorithms will have:
§ A problem graph, containing the start node S and the goal node G.
§ A strategy, describing the manner in which the graph will be traversed to get
to G.
§ A fringe, which is a data structure used to store all the possible states (node
s) that you can go from the current states.
§ A tree that results while traversing to the goal node. A solution plan, which
the sequence of nodes from S to G.

Breadth First Search
• In Breadth First Search(BFS), the root node of the graph is expanded first, t
hen all the successor nodes are expanded and then their successor and so
on i.e. the nodes are expanded level wise starting at root level.
• Breadth-first search (BFS) is an algorithm for traversing or searching tree or
graph data structures. It starts at the tree root (or some arbitrary node of a gr
aph, sometimes referred to as a ‘search key’), and explores all of the neighb
or nodes at the present depth prior to moving on to the nodes at the next de
pth level.
• BFS algorithm starts searching from the root node of the tree and expands a
ll successor nodes at the current level before moving to nodes of next level.
• The breadth-first search algorithm is an example of a general-graph search
• Breadth-first search implemented using FIFO queue data structure.

Example 1
• Path will traverse will be 0-1-2-3-4-5-6-7
• Source node =0
• Destination node =7

Example 2
• S= source node
• K= destination node
• Path will be according to BFS
S---> A--->B---->C--->D---->G--->H--->
Points to Remember
• Time Complexity: Time Complexity of BFS algorithm can be obtained by the
number of nodes traversed in BFS until the shallowest Node.
T (b) = 1+b2 +b3 +.......+ bd = O (bd )
Where, d= depth of shallowest solution
b = node at every state.
• Space Complexity: Space complexity of BFS algorithm is given by the Mem
ory size of frontier which is O(bd ).
• Completeness: BFS is complete, which means if the shallowest goal node is
at some finite depth, then BFS will find a solution.
• Optimality: BFS is optimal if path cost of all edges of tree/graph is same.

Advantages of BFS
• In this procedure at any way it will find the
• It does not follow a single unfruitful path f
or a long time. It finds the minimal solutio
n in case of multiple paths.
• There is nothing like useless path in BFS
, since it searches level by level.
Disadvantages of BFS
• BFS consumes large memory space. Its ti
me complexity is more.
• It has long pathways, when all paths to a
destination are on approximately the sam
e search depth.

Depth First Search
• The DFS algorithm is a recursive algorithm that uses the ide
a of backtracking. It involves exhaustive searches of all the
nodes by going ahead, if possible, else by backtracking.
• Here, the word backtrack means that when you are moving f
orward and there are no more nodes along the current path,
you move backwards on the same path to find nodes to trav
erse. All the nodes will be visited on the current path till all th
e unvisited nodes have been traversed after which the next
path will be selected.
• This recursive nature of DFS can be implemented using stac

DFS Contd..
• The basic idea is as follows:
• Pick a starting node and push all its adjac
ent nodes into a stack.
• Pop a node from stack to select the next
node to visit and push all its adjacent nod
es into a stack.
• Repeat this process until the stack
245 is emp
However, ensure that the nodes that are
Example 1

NOTE:-Follow the dotted arrows

to determine the sequence.

Example 2

Points to Remember
• Time complexity: T(n)= 1+ n2+ n3 +.........+ nm =O(nm)
Where, m= maximum depth of any node
n= no. of nodes in each level
• Completeness: DFS will provide completeness feature whe
n state space is finite
• Space complexity: DFS algorithm needs to store only singl
e path from the root node O (bm)
• Optimality: DFS is not optimal, meaning the number of step
s in reaching the solution, or the cost spent in reaching it is h

Advantages of DFS
• Memory requirement is Linear with respe
ct to Nodes.
• Less time and space complexity rather th
an BFS.
• Solution can be found out by without muc
h more search.
Disadvantages of DFS
• Not guaranteed that it will give you solutio
• Cut-off depth is smaller so time complexit
y is more
• Determination of depth until the search h
as proceeds.
The Informed Search
• Informed search methods use knowledge about the problem domain and choose promis
ing operators first.
• These heuristic search methods use heuristic functions to evaluate the next state towar
ds the goal state.
For finding a solution, by using the heuristic technique, one should carry out the following
1. Add domain—specific information to select what is the best path to continue searching al
2. Define a heuristic function h(n) that estimates the ‗goodness‘ of a node n. Specifically, h(
n) = estimated cost(or distance) of minimal cost path from n to a goal state.
3. The term, heuristic means ‗serving to aid discovery‘ and is an estimate, based on domai
n specific information that is computable from the current state description of how close
we are to a goal.

Heuristics function:
• Heuristic is a function which is used in Informed
Search, and it finds the most promising path. It t
akes the current state of the agent as its input a
nd produces the estimation of how close agent i
s from the goal.
• Heuristic function estimates how close a state is
to the goal.
• It is represented by h(n), and it calculates the co
st of an optimal path between the pair of states.
Characteristics of heuristic s
• Heuristics are knowledgeearch
about domain, which help search a
nd reasoning in its domain.
• Heuristic search incorporates domain knowledge to improve e
fficiency over blind search.
• Heuristic is a function that, when applied to a state, returns va
lue as estimated merit of state, with respect to goal.
• Heuristic evaluation function estimates likelihood of given stat
e leading to goal state.
• Heuristic search function estimates cost from current state to
goal, presuming function is efficiency

Hill Climbing Algorithm(Gradi
ent Ascent/ Descent Algorith
• Iteratively maximize ―”value” of current state, by replacing it by successor state that ha
s highest value, as long as possible.

Note: minimizing a ―”value” function v(n) is equivalent to maximizing –v(n), thus both n
otions are used interchangeably.
Hill climbing algorithm is a local search algorithm which continuously moves in the direc
tion of increasing elevation/value to find the peak of the mountain or best solution to the
problem. It terminates when it reaches a peak value where no neighbor has a higher val
• In this algorithm, we don't need to maintain and handle the search tree or graph as it onl
y keeps a single current state
• It is also called greedy local search as it only looks to its good immediate neighbor state
and not beyond that.
• A node of hill climbing algorithm has two components which are state and value.
• Hill Climbing is mostly used when a good heuristic is available.

Characteristics of Hill Climbi
ng: It is variation of a gener
• Generate and Test variant:
ate-and-test algorithm which discards all states whic
h do not look promising or seem unlikely to lead us t
o the goal state. To take such decisions, it uses heuri
stics (an evaluation function) which indicates how clo
se the current state is to the goal state.
• Greedy approach: Hill-climbing algorithm search m
oves in the direction which optimizes the cost.
• No backtracking: It does not back track the search
space, as it does not remember the previous states.
State Space diagram of Hill C
limbing search Algorithm

Related Terminology
• Local Maximum: Local maximum is a state which is better than its n
eighbor states, but there is also another state which is higher than i
• Global Maximum: Global maximum is the best possible state of stat
e space landscape. It has the highest value of objective function.
• Current state: It is a state in a landscape diagram where an agent i
s currently present.
• Flat local maximum: It is a flat space in the landscape where all the
neighbor states of current states have the same value.
• Shoulder: It is a plateau region which has an uphill edge.

Simple Hill Climbing
• It only evaluates the neighbor node sta
te at a time and selects the first one w
hich optimizes current cost and set it a
s a current state.
• It only checks it's one successor state, an
d if it finds better than the current state, th
en move else be in the same state.
Algorithm of Simple Hill Clim
Step 1: Define the current state as an initial state
Step 2: Loop until the goal state is achieved or no more operators can be applie
d on the current state:
a. Apply an operation to current state and get a new state
b. Compare the new state with the goal
c. Quit if the goal state is achieved
d. Evaluate new state with heuristic function and compare it with the current stat
e. If the newer state is closer to the goal compared to current state, update the c
urrent state
f. Else if not better than the current state, then return to step 2.
Step 3: Exit

Problems associated with Hill Cl
imbing Algorithm
Hill Climbing is a short sighted technique as it evaluates only immediate possibil
ities. So it may end up in few situations from which it can not pick any further
states. Let‘s look at these states and some solutions for them.
1. Local maximum: It‘s a state which is better than all neighbors, but there exis
ts a better state which is far from the current state; if local maximum occurs
within sight of the solution, it is known as ―foothills’
2. Plateau: In this state, all neighboring states have same heuristic values, so it‘
s unclear to choose the next state by making local comparisons
3. Ridge: It‘s an area which is higher than surrounding states, but it cannot be r
eached in a single move; for example, we have four possible directions to ex
plore (N, E, W, S) and an area exists in NE direction

There are few solutions to overcome these
1. We can backtrack to one of the previo
us states and explore other directions
2. We can skip few states and make a jum
p in new directions
3. We can explore several directions
261 to fi
gure out the correct path
**Simulated Annealing **
• A hill-climbing algorithm which never makes a move towards a lower value g
uaranteed to be incomplete because it can get stuck on a local maximum. A
nd if algorithm applies a random walk, by moving a successor, then it may c
omplete but not efficient. Simulated Annealing is an algorithm which yields b
oth efficiency and completeness.
• In mechanical term Annealing is a process of hardening a metal or glass to a
high temperature then cooling gradually, so this allows the metal to reach a l
ow-energy crystalline state. The same process is used in simulated annealin
g in which the algorithm picks a random move, instead of picking the best m
ove. If the random move improves the state, then it follows the same path. O
therwise, the algorithm follows the path which has a probability of less than
1 or it moves downhill and chooses another path.

A* Algorithm
• A* is based on using heuristic methods to achieve optimality and co
mpleteness, and is a variant of the best-first algorithm.
• When a search algorithm has the property of optimality, it means it i
s guaranteed to find the best possible solution, in our case the shor
test path to the finish state. When a search algorithm has the prope
rty of completeness, it means that if a solution to a given problem e
xists, the algorithm is guaranteed to find it.
• In A* search algorithm, we use search heuristic as well as the cost t
o reach the node. Hence we can combine both costs and this sum i
s called as a fitness number.

A* Algorithm Contd.
• Each time A* enters a state, it calculates the cost, f(n) (n being the
neighboring node), to travel to all of the neighboring nodes, and the
n enters the node with the lowest value of f(n).
f (n) = g (n) + h (n)
f(n) = Estimated cost of cheapest solution
g(n) = Cost to reach node n from initial state
h(n) = Cost to reach from node to goal node
• Typically, the A* algorithm is typically used for graphs and graph tra
versals. In terms of graphs, A* is used for finding the shortest path t
o a certain point from a given point.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return fail
ure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of eval
uation function (g+h), if node n is goal node then return success and stop, ot
Step 4: Expand node n and generate all of its successors, and put n into the clo
sed list. For each successor n', check whether n' is already in the OPEN or
CLOSED list, if not then compute evaluation function for n' and place into O
pen list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attac
hed to the back pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.

Points to Remember
• Completeness: A* algorithm is complete as long as:
a. Branching factor is finite.
b. Cost at every action is fixed.
• Optimal It is optimal when it follows two following condition
a. Admissible
b. Consistency
• Time Complexity :O(b^d)
b = branching factor
d = depth of solution
• Space Complexity :O(b^d)

• It is complete and optimal.
• It is the best one from other techniques.
• It is used to solve very complex problems.
• SIt is optimally efficient, i.e. there is no other
optimal algorithm guaranteed to expand few
er nodes than A*.
• This algorithm is complete if the branchin
g factor is finite and every action has fixe
d cost.
• The main drawback of A* is memory requi
rement as it keeps all generated nodes in
the memory, so it is not practical for vario
us large-scale problems.
AO* (AND-OR) Algorithm
• When a problem can be divided into a set of sub problems,
where each sub problem can be solved separately and a co
mbination of these will be a solution, AND-OR graphs or AN
D - OR trees are used for representing the solution.
• The decomposition of the problem or problem reduction gen
erates AND arcs. One AND are may point to any number of
successor nodes. All these must be solved so that the arc wi
ll rise to many arcs, indicating several possible solutions. He
nce the graph is known as AND - OR instead of AND
• In


• In an AND-OR graph AO* algorithm is an efficient method to explor
e a solution path.
• AO* algorithm works mainly based on two phases. First phase will f
ind a heuristic value for nodes and arcs in a particular level. The ch
anges in the values of nodes will be propagated back in the next ph
• In order to find solution in an AND-OR graph AO* algorithm works
well similar to best first search with an ability to handle the AND arc
appropriately. The algorithm finds an optimal path from initial node
by propagating the results like solution and change in heuristic valu
e to the ancestors as in algorithm.


In figure (a)
• The top node A has been expanded producing two area one leadin
g to B and leading to C-D
• The numbers at each node represent the value of f ' at that node (c
ost of getting to the goal state from current state). For simplicity, it i
s assumed that every operation (i.e. applying a rule) has unit cost, i
.e., each are with single successor will have a cost of 1 and each of
its components.
• With the available information till now , it appears that C is the most
promising node to expand since its f ' = 3 , the lowest but going thro
ugh B would be better since to use C we must also use D' and the
cost would be 9(3+4+1+1). Through B it would be 6(5+1).

