Aiml FPP

RAJALAKSHMI ENGINEERING COLLEGE
THANDALAM – 602105
FACULTY PREPARATION PROGRAM- REVEIW REPORT (Rev 0.1 wef 01.11.2019)
PART -1
Name of the mentor: ___________________________ Subject Title & Code: ___________________
Names of the members handling the subject : ___________________________ Branch:
: ___________________________ Sem/Year:
1. Lesson Plan
Not prepared – to be prepared and submitted
Prepared as per the format and no changes recommended
Prepared but the following changes are recommended
Remarks &Recommendations:
2. Notes On Lesson
Number of units covered in the NOL
Percentage of coverage
Unit-1 Unit 2 Unit3 Unit 4 Unit5
Standard of the material

i. Concepts are explained clearly but standard is too high, students
may find it tough to follow
ii. Concepts are explained clearly in a simple way so that students
may find it easy to follow
iii. Very basic levels are covered, students may easily follow;
however more details and/or advanced topics not covered
Remarks & Recommendations:

3. Question bank
Number of Part – A questions
Number of Part –B questions
Number of Part – A questions for which Solutions are provided
Number of Part – B questions for which Solutions are provided
GATE questions and answers of past 5 years(if Any)
Exercise problems and answers (3-5 per unit ) from reference books
Remarks & Recommendations:
4. University Questions
Number of Part – A questions
Number of Part –B questions
Number of Part – A questions for which solutions are provided
Number of Part – B questions for which solutions are provided
Remarks:
Name of Reviewer Signature of Reviewer /Date
Mentor HOD Course Teacher

Rajalakshmi Engineering College
Vision
To be an institution of excellence in Engineering, Technology and Management Education &
Research. To provide competent and ethical professionals with a concern for society.
Mission
To impart quality technical education imbibed with proficiency and humane values. To provide
right ambience and opportunities for the students to develop into creative, talented and
globally competent professionals. To promote research and development in technology and
management for the benefit of the society.
Department of Information Technology
Vision
To be a Department of Excellence in Information Technology Education, Research and
Development.
Mission
To train the students to become highly knowledgeable in the field of Information Technology.
To promote continuous learning and research in core and emerging areas.
To develop globally competent students with strong foundations, who will be able to adapt to
changing technologies.
Department of Information Technology
PROGRAMME EDUCATIONAL OBJECTIVES
 PEO I
To provide essential background in Science, basic Electronics and applied Mathematics.
 PEO II
To prepare the students with fundamental knowledge in programming languages and to
develop applications.
 PEO III
To engage the students in life-long learning, and make them to remain current in their
profession and obtain additional qualifications to enhance their career positions in IT industries.
 PEO IV
To enable the students to implement computing solutions for real world problems and carry out
basic and applied research leading to new innovations in Information Technology (IT) and
related interdisciplinary areas.
 PEO V
To familiarize the students with the ethical issues in engineering profession, issues related to the
worldwide economy, nurturing of current job related skills and emerging technologies.
PROGRAMME OUTCOMES
 PO1: Engineering knowledge: Apply the knowledge of mathematics, science,
engineering fundamentals, and an engineering specialization to the solution of complex
engineering problems.
 PO2: Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first principles
of mathematics, natural sciences, and engineering sciences.
 PO3: Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified needs
with appropriate consideration for the public health and safety, and the cultural,
societal, and environmental considerations.
 PO4: Conduct investigations of complex problems: Use research-based knowledge and
research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.
 PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
 PO6: The engineer and society: Apply reasoning informed by the contextual knowledge
to assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.
 PO7: Environment and sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
 PO8: Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
 PO9: Individual and team work: Function effectively as an individual, and as a member
or leader in diverse teams, and in multidisciplinary settings.
 PO10: Communication: Communicate effectively on complex engineering activities with
the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make effective
presentations, and give and receive clear instructions.
 PO11: Project management and finance: Demonstrate knowledge and understanding of
the engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary
environments.
 PO12: Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.
PROGRAM SPECIFIC OUTCOMES (PSOs)
 To comprehend and analyze user requirements to design IT based solutions

 To identify and assess current technologies and review their applicability to address individual
and organizational needs.
 To engage in the computing profession by working effectively and utilizing professional skills to
make a positive contribution to society
 To take on positions as promoters in business and embark on a research career in the field.
Subject Code Subject Name (Lab oriented Theory course) Category L T P C
IT19643 ARTIFICIAL INTELLIGENCE AND MACHINE PC 3 0 2 4

LEARNING
Objectives: Broad objective of this course is

 To learn the methods of solving problems using Artificial Intelligence.
 To formalise a given problem in the language/framework of different AI
methods.
 To introduce the concepts of machine learning.
 To know the basics of neural networks and other machine learning algorithms.
 To familiarize about the Expert Systems
UNIT-I INTRODUCTION TO Al AND PRODUCTION SYSTEMS 9

Introduction to AI-Problem formulation, Problem Definition-Production systems, Control
strategies, Search strategies Problem characteristics, Production system characteristics-
Specialized productions system- Problem solving methods – Problem graphs, Matching,
Indexing and Heuristic functions -Hill Climbing-Depth first and Breath first, Constraints
satisfaction – Related algorithms, Measure of performance and analysis of search
algorithms.
UNIT-II KNOWLEDGE REPRESENTATION AND INFERENCE 9
Game playing, Knowledge representation using Predicate logic and calculus, Structured
representation of knowledge Production based system, Frame based system, Inference –
Backward chaining, Forward chaining, Rule value approach, Fuzzy reasoning – Certainty
factors, Bayesian Theory-Bayesian Network-Dempster – Shafer theory.
UNIT-III MACHINE LEARNING BASICS 9
Learning – Designing a learning system, Perspectives and issues in machine learning,
Concept Learning – as task, as search. Types of Machine Learning – Supervised Learning-
Regression, Classification. Testing Machine Learning
Algorithms- Over fitting, Training, Testing and Validation Sets, The confusion Matrix,
Accuracy Metrics, ROC Curve, Unbalanced Datasets, Measurement Precision.
UNIT-IV NEURAL NETWORKS 9
The Brain and the Neuron – Neural Networks –The Perceptron- Linear Separability- Linear
Regression-Examples. Unsupervised Learning- The K-means algorithm-Vector Quantization-
The self organizing feature map.
UNIT-V EXPERT SYSTEMS 9
Expert systems – Architecture of expert systems, Roles of expert systems – Knowledge
Acquisition – Meta knowledge, Heuristics. Typical expert systems – MYCIN, DART, XOON,
Expert systems shells.
Total Contact Hours: 45
List of Experiments
1 Study of Prolog.
2 Write simple fact for the statements using PROLOG.
3 Write predicates One converts centigrade temperatures to Fahrenheit, the other checks if
a temperature is below freezing.
4 Write a program to solve the Monkey Banana problem.
5 WAP in turbo prolog for medical diagnosis and show the advantage and disadvantage of
green and red cuts.
6 Write a program to solve 4-Queen problem.
7 Write a program to solve traveling salesman problem
8 Write a program to solve water jug problem.
9 Write a python program to implement linear regression.
10 Write a python program for ML classification algorithms
a. Logistic Regression
b. Decision Tree
11 Write a python program to implement
a. K-Nearest Neighbor algorithm
b. SVM
12 Write a python program to implement a simple Neural Network.
Contact Hours : 30
Total Contact Hours : 75
Course Outcomes:
On completion of course students will be able to
CO1 earn the methods of solving problems using Artificial Intelligence.

CO2 Formalize a given problem in the language/framework of different AI methods.
CO3 Introduce the concepts of machine learning.
CO4 Know the basics of neural networks and other machine learning algorithms.
CO5 Familiarize about the Expert Systems
CO PO MAPPING
CO/PO PO PO PO PO PO PO PO PO PO PO1 PO1 PO1 PSO PSO PSO PSO

1 2 3 4 5 6 7 8 9 0 1 2 1 2 3 4
CO1 2 3 2 3 2 - - - - - - - 2 2 1 1
CO2 1 3 3 1 1 - - - - - - - - 1 - -
CO3 1 3 3 3 3 - - - - - - - - 1 2 3
CO4 1 2 3 2 - - - - - - - - - 1 2 3
CO5 - 3 3 3 - - - - - - - - 2 1 2 2
CO
1.25 2.80 2.80 2.40 2.00 - - - - - - - 2.00 1.20 1.75 2.25
(Avg)
Correlation levels 1, 2 or 3 are as defined below:

1: Slight (Low) 2: Moderate (Medium) 3: Substantial(High) No correlation : “-”
RAJALAKSHMI ENGINEERING COLLEGE
LESSON PLAN
FACULTY NAME FACULTY CODE
ARTIFICIAL INTELLIGENCE AND

SUBJECT NAME SUBJECT CODE IT19643
MACHINE LEARNING
CLASS III YEAR IT
COURSE OBJECTIVES
 To learn the methods of solving problems using Artificial Intelligence.

 To formalise a given problem in the language/framework of different AI methods.
 To introduce the concepts of machine learning.
 To know the basics of neural networks and other machine learning algorithms.
 To familiarize about the Expert Systems
COURSE PLAN
UNIT – 1 INTRODUCTION TO Al AND PRODUCTION SYSTEMS
Proposed Actual
Session
Date/ Date/ Topics to be covered Ref Teaching Aids
No
Period Period
Introduction to AI T1 PPT+Class Notes
Problem formulation T1 Class Notes
Problem Definition T1 Class Notes
Production systems T1 Class Notes
Control strategies T1 Class Notes
Search strategies T1 Class Notes
Problem characteristics T1 Class Notes
Production system characteristics T1 Class Notes
Specialized productions system T1 Class Notes
Problem solving methods T1 Class Notes
Problem graphs T1 Class Notes
Matching T1 Class Notes
Indexing T1 Class Notes
Heuristic functions T1 Class Notes
Hill Climbing T1 Class Notes
Depth first and Breath first T1 Class Notes
Constraints satisfaction T1 Class Notes
Related algorithms, Measure of T1 Class Notes
performance and analysis of search
algorithms.
UNIT – 2 KNOWLEDGE REPRESENTATION AND INFERENCE
Proposed Actual
Session Teaching
Date/ Date/ Topics to be covered Ref
No Aids
Period Period
Game playing T1 PPT+Class Notes
Knowledge representation using Predicate T1 PPT+Class Notes
logic and calculus
Structured representation of knowledge T1 PPT+Class Notes
Production based system T1 PPT+Class Notes
Frame based system T1 PPT+Class Notes
Inference – Backward chaining T1 Class Notes
Forward chaining T1 Class Notes
Rule value approach T1 Class Notes
Fuzzy reasoning T1 Class Notes
Certainty factors T1 Class Notes
Bayesian Theory-Bayesian Network- T1 Class Notes
Dempster – Shafer theory. T1 Class Notes
UNIT – 3 MACHINE LEARNING BASICS
Proposed Actual
Session Teaching
No Aids
Period Period
Learning T2 Class Notes
Designing a learning system T2 Class Notes
Perspectives and issues in T2 Class Notes
machine learning
Concept Learning – as task T2 Class Notes
Concept Learning – as search T2 Class Notes
Types of Machine Learning T2 Class Notes
Supervised Learning T2 Class Notes
Regression T2 Class Notes
Classification. T2 Class Notes
Testing Machine Learning T2 Class Notes
Algorithms- Over fitting
Training, Testing and Validation Sets, T2 Class Notes
The confusion Matrix, Accuracy Metrics, T2 Class Notes
ROC Curve T2 Class Notes
Unbalanced Datasets, Measurement T2 Class Notes
Precision.
UNIT – 4 NEURAL NETWORKS
Proposed Actual
Session Teaching
No Aids
Period Period
The Brain and the Neuron T2 Class Notes
Neural Networks T2 Class Notes
The Perceptron T2 Class Notes
Linear Separability T2 Class Notes
Linear Regression-Examples. T2 Class Notes
Unsupervised Learning T2 Class Notes
The K-means algorithm T2 Class Notes
Vector Quantization T2 Class Notes
The Self organizing feature map. T2 Class Notes
UNIT – 5 EXPERT SYSTEMS
Proposed Actual
Session Teaching
No Aids
Period Period
Expert systems T1 PPT+Class Notes
Architecture of expert T1 PPT+Class Notes
systems,
Roles of expert systems – T1 PPT+Class Notes
Knowledge Acquisition – T1 PPT+Class Notes
Meta knowledge, T1 PPT+Class Notes
Heuristics. T1 PPT+Class Notes
Typical expert systems – T1 PPT+Class Notes
MYCIN
DART T1 PPT+Class Notes
XOON T1 PPT+Class Notes
Expert systems shells. T1 PPT+Class Notes
IT19643 - ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
UNIT – I INTRODUCTION TO AI AND PRODUCTION SYSTEMS
ARTIFICIAL INTELLIGENCE
 It is a branch of Computer Science that pursues creating the computers or machines as

intelligent as human beings.
 It is the science and engineering of making intelligent machines, especially intelligent
computer programs.
 It is related to the similar task of using computers to understand human intelligence, but
AI does not have to confine itself to methods that are biologically observable.
Definition: AI
 Artificial Intelligence is the study of how to make computers do things, which, at the
moment, people do better.
 According to the father of Artificial Intelligence, John McCarthy, it is “The science and
engineering of making intelligent machines, especially intelligent computer programs”.
 Artificial Intelligence is a way of making a computer, a computer-controlled robot, or

a software think intelligently, in the similar manner the intelligent humans think.
 From a business perspective AI is a set of very powerful tools, and methodologies for
using those tools to solve business problems.
 From a programming perspective, AI includes the study of symbolic programming,

problem solving, and search.
AI Vocabulary
 Intelligence relates to tasks involving higher mental processes, e.g. creativity, solving
problems, pattern recognition, classification, learning, induction, deduction, building
analogies, optimization, language processing, knowledge and many more. Intelligence is
the computational part of the ability to achieve goals.
 Intelligent behaviour is depicted by perceiving one’s environment, acting in complex

environments, learning and understanding from experience, reasoning to solve problems
and discover hidden knowledge, applying knowledge successfully in new situations,
thinking abstractly, using analogies, communicating with others and more.
 Science based goals of AI pertain to developing concepts, mechanisms and understanding

biological intelligent behaviour. The emphasis is on understanding intelligent behaviour.
 Engineering based goals of AI relate to developing concepts, theory and practice of

building intelligent machines. The emphasis is on system building.
 AI Techniques depict how we represent, manipulate and reason with knowledge in order
to solve problems. Knowledge is a collection of ‘facts’. To manipulate these facts by a
program, a suitable representation is required. A good representation facilitates problem
solving.
 Learning means that programs learn from what facts or behaviour can represent. Learning
denotes changes in the systems that are adaptive in other words, it enables the system to do
the same task(s) more efficiently next time.
Problems of AI:
Intelligence does not imply perfect understanding; every intelligent being has limited
perception, memory and computation. Many points on the spectrum of intelligence versus cost are
viable, from insects to humans. AI seeks to understand the computations required from intelligent
behaviour and to produce computer systems that exhibit intelligence. Aspects of intelligence
studied by AI include perception, communicational using human languages, reasoning, planning,
learning and memory.
Task Domains of AI
Mundane Tasks:
Perception
-Vision
-Speech
Natural Languages
-Understanding
-Generation
-Translation
Common sense reasoning
Robot Control
Formal Tasks
Games: chess, checkers etc
Mathematics: Geometry, logic, Proving properties of programs
Expert Tasks:
Engineering (Design, Fault finding, Manufacturing planning)
Scientific Analysis
Medical Diagnosis
Financial Analysis
The following questions are to be considered before we can step forward:
1. What are the underlying assumptions about intelligence?

2. What kinds of techniques will be useful for solving AI problems?
3. At what level human intelligence can be modelled?
4. When will it be realized when an intelligent program has been built?
The underlying assumptions:
A physical symbol system

A physical symbol system is a machine "that produces through time an evolving collection
of symbol structures”. The system contains processes for creating and modifying symbol
structures, which are expressions that contain symbols as components. Symbols themselves are
physical patterns that denote objects.
AI Technique:
Artificial Intelligence research during the last three decades has concluded that
Intelligence requires knowledge. To compensate overwhelming quality, knowledge possesses less
desirable properties.
 It is huge.
 It is difficult to characterize correctly.
 It is constantly varying.
 It differs from data by being organized in a way that corresponds to its application.
 It is complicated.
An AI technique is a method that exploits knowledge that is represented so that:
 The knowledge captures generalizations that share properties, are grouped together, rather
than being allowed separate representation.
 It can be understood by people who must provide it—even though for many programs bulk of
the data comes automatically from readings.
 In many AI domains, how the people understand the same people must supply the knowledge
to a program.
 It can be easily modified to correct errors and reflect changes in real conditions.
 It can be widely used even if it is incomplete or inaccurate.
 It can be used to help overcome its own sheer bulk by helping to narrow the range of
possibilities that must be usually considered.
In order to characterize an AI technique let us consider two different problems and a series of
approaches for solving each of them.
1. Tic-Tac-Toe
2. Question Answering
Tic-Tac-Toe
The programs increase in complexity, their use of generalizations, the clarity of their
knowledge and the extensibility of their approach. In this way they move towards being
representations of AI techniques.
Program 1: The first approach (simple)

The Tic-Tac-Toe game consists of a nine element vector called BOARD; it represents the
numbers 1 to 9 in three rows.
An element contains the value 0 for blank, 1 for X and 2 for O. A MOVETABLE vector
consists of 19,683 elements (39) and is needed where each element is a nine element vector. The
contents of the vector are especially chosen to help the algorithm.
The algorithm makes moves by pursuing the following:
1. View the vector as a ternary number. Convert it to a decimal number.

2. Use the decimal number as an index in MOVETABLE and access the vector.
3. Set BOARD to this vector indicating how the board looks after the move. This approach is
capable in time but it has several disadvantages. It takes more space and requires stunning effort
to calculate the decimal numbers. This method is specific to this game and cannot be completed.
Program 2: The second approach

The structure of the data is as before but we use 2 for a blank, 3 for an X and 5 for an O. A
variable called TURN indicates 1 for the first move and 9 for the last.
The algorithm consists of three actions:
 MAKE2 which returns 5 if the centre square is blank; otherwise it returns any blank non
corner square, i.e. 2, 4, 6 or 8.
 POSSWIN (p) returns 0 if player p cannot win on the next move and otherwise returns the
number of the square that gives a winning move. It checks each line using products 3*3*2 =
18 gives a win for X, 5*5*2=50 gives a win for O, and the winning move is the holder of the
blank.
 GO (n) makes a move to square n setting BOARD[n] to 3 or 5.
This algorithm is more involved and takes longer but it is more efficient in storage which
compensates for its longer time. It depends on the programmer’s skill.
Program 3: The final approach
 The structure of the data consists of BOARD which contains a nine element vector, a list of
board positions that could result from the next move and a number representing an estimation
of how the board position leads to an ultimate win for the player to move.
 This algorithm looks ahead to make a decision on the next move by deciding which the most
promising move or the most suitable move at any stage would be and selects the same.
 Consider all possible moves and replies that the program can make. Continue this process for
as long as time permits until a winner emerges, and then choose the move that leads to the
computer program winning, if possible in the shortest time.
 Actually this is most difficult to program by a good limit but it is as far that the technique can
be extended to in any game. This method makes relatively fewer loads on the programmer in
terms of the game technique but the overall game strategy must be known to the adviser.
Question Answering
Let us consider Question Answering systems that accept input in English and provide
answers also in English. This problem is harder than the previous one as it is more difficult to
specify the problem properly. Another area of difficulty concerns deciding whether the answer
obtained is correct, or not, and further what is meant by ‘correct’.
For example, consider the following situation:
Rani went shopping for a new Coat. She found a red one she really liked.
When she got home, she found that it went perfectly with her favourite dress.
Questions
1. What did Rani go shopping for?
2. What did Rani find that she liked?
3. Did Rani buy anything?
Program 1:
A set of templates that match common questions and produce patterns used to match
against inputs. Templates and patterns are used so that a template that matches a given question is
associated with the corresponding pattern to find the answer in the input text. For example, the
template who did x y generates x y z if a match occurs and z is the answer to the question. The
given text and the question are both stored as strings.
Algorithm
Answering a question requires the following four steps to be followed:
 Compare the template against the questions and store all successful matches to produce a
set of text patterns.
 Pass these text patterns through a substitution process to change the person or voice and
produce an expanded set of text patterns.
 Apply each of these patterns to the text; collect all the answers and then print the answers.
 Reply with the set of answers just collected
In question 1 we use the template WHAT DID X Y which generates Rani go shopping for z
and after substitution we get Rani goes shopping for z and Rani went shopping for z giving z
[equivalence] a new coat
In question 2 we need a very large number of templates and also a scheme to allow the
insertion of ‘find’ before ‘that she liked’; the insertion of ‘really’ in the text; and the substitution
of ‘she’ for ‘Rani’ gives the answer ‘a red one’.
question 3 cannot be answered.
Program 2:
A structure called English consists of a dictionary, grammar and some semantics about the
vocabulary we are likely to come across. This data structure provides the knowledge to convert
English text into a storable internal form and also to convert the response back into English. The
structured representation of the text is a processed form and defines the context of the input text
by making explicit all references such as pronouns.
Example sentence: ‘She found a red one she really liked’

The structured representation of a Sentence:
Event2
instance: finding
tense: past
agent: Rani
object: Thing1
Thing1
instance: coat
colour: red
Event2
instance: finding instance: liking
tense: past
modifier: much
object: Thing1
The question is stored in two forms: as input and in the above form.
Algorithm
 Convert the question to a structured form using English know how, then use a marker to
indicate the substring (like ‘who’ or ‘what’) of the structure, that should be returned as an
answer. If a slot and filler system is used a special marker can be placed in more than one slot.
 The answer appears by matching this structured form against the structured text.
 The structured form is matched against the text and the requested segments of the
question are returned.
Examples
Both questions 1 and 2 generate answers via a new coat and a red coat respectively.
Question 3 cannot be answered, because there is no direct response.
Program 3:
World model contains knowledge about objects, actions and situations that are described
in the input text. This structure is used to create integrated text from input text. The diagram
shows how the system’s knowledge of shopping might be represented and stored. This
information is known as a script and in this case is a shopping script.
Algorithm
Convert the question to a structured form using both the knowledge contained in Method 2
and the World model, generating even more possible structures, since even more knowledge is
being used. Sometimes filters are introduced to prune the possible answers.
To answer a question, the scheme followed is:
 Convert the question to a structured form as before but use the world model to resolve any
ambiguities that may occur.
 The structured form is matched against the text and the requested segments of the question
are returned.
Fig: A shopping Script

LEVEL OF THE AI MODEL
‘What is our goal in trying to produce programs that do the intelligent things that people
do?’
Are we trying to produce programs that do the tasks the same way that people do?
OR
Are we trying to produce programs that simply do the tasks the easiest way that is
possible?
Programs in the first class attempt to solve problems that a computer can easily solve and
do not usually use AI techniques. AI techniques usually include a search, as no direct method is
available, the use of knowledge about the objects involved in the problem area and abstraction on
which allows an element of pruning to occur, and to enable a solution to be found in real time;
otherwise, the data could explode in size. Examples of these trivial problems in the first class,
which are now of interest only to psychologists are EPAM (Elementary Perceiver and Memorizer)
which memorized garbage syllables.
The second class of problems attempts to solve problems that are non-trivial for a
computer and use AI techniques. We wish to model human performance on these:
1. To test psychological theories of human performance. Ex. PARRY a program to simulate
the conversational behavior of a paranoid person.
2. To enable computers to understand human reasoning – for example, programs that answer
questions based upon newspaper articles indicating human behavior.
3. To enable people to understand computer reasoning. Some people are reluctant to accept
computer results unless they understand the mechanisms involved in arriving at the results.
4. To exploit the knowledge gained by people who are best at gathering information. This
persuaded the earlier workers to simulate human behavior in the SB part of AISB simulated
behavior. Examples of this type of approach led to GPS (General Problem Solver).
PROBLEM FORMULATION
Problems, Problem Spaces and Search

To solve the problem of building a system you should take the following steps:
1. Define the problem accurately including detailed specifications and what constitutes a suitable
solution.
2. Scrutinize the problem carefully, for some features may have a central affect on the chosen
method of solution.
3. Segregate and represent the background knowledge needed in the solution of the problem.
4. Choose the best solving techniques for the problem to solve a solution.
Problem solving is a process of generating solutions from observed data.

 a ‘problem’ is characterized by a set of goals,
 a set of objects, and
 a set of operations.
These could be ill-defined and may evolve during problem solving.
A ‘problem space’ is an abstract space.
 A problem space encompasses all valid states that can be generated by the application of
any combination of operators on any combination of objects.
 The problem space may contain one or more solutions. A solution is a combination of
operations and objects that achieve the goals.
A ‘search’ refers to the search for a solution in a problem space.
 Search proceeds with different types of ‘search control strategies’.
 The depth-first search and breadth-first search are the two common search strategies.
DEFINING PROBLEM AS A STATE SPACE SEARCH
To solve the problem of playing a game, we require the rules of the game and targets for
winning as well as representing positions in the game. The opening position can be defined as the
initial state and a winning position as a goal state. Moves from initial state to other states leading
to the goal state follow legally. However, the rules are far too abundant in most games—
especially in chess, where they exceed the number of particles in the universe. Thus, the rules
cannot be supplied accurately and computer programs cannot handle easily. The storage also
presents another problem but searching can be achieved by hashing.
The number of rules that are used must be minimized and the set can be created by
expressing each rule in a form as possible. The representation of games leads to a state space
representation and it is common for well-organized games with some structure. This
representation allows for the formal definition of a problem that needs the movement from a set of
initial positions to one of a set of target positions. It means that the solution involves using known
techniques and a systematic search. This is quite a common method in Artificial Intelligence.
Example: Play Chess
Fig: One legal Chess Move Fig: Another way to Describe Chess Move
State Space Search

A state space represents a problem in terms of states and operators that change states.
A state space consists of:
 A representation of the states the system can be in. For example, in a board game, the
board represents the current state of the game.
 A set of operators that can change one state into another state. In a board game, the
operators are the legal moves from any given state. Often the operators are represented as
programs that change a state representation to represent the new state.
 An initial state.
 A set of final states; some of these may be desirable, others undesirable. This set is often
represented implicitly by a program that detects terminal states.
Example: The Water Jug Problem
In this problem, we use two jugs called four and three; four holds a maximum of four
gallons of water and three a maximum of three gallons of water. How can we get two gallons of
water in the four jug?
The state space is a set of prearranged pairs giving the number of gallons of water in the
pair of jugs at any time, i.e., (four, three) where four = 0, 1, 2, 3 or 4 and three = 0, 1, 2 or 3.
The start state is (0, 0) and the goal state is (2, n) where n may be any but it is limited to three
holding from 0 to 3 gallons of water or empty. Three and four shows the name and numerical
number shows the amount of water in jugs for solving the water jug problem.
The major production rules for solving this problem are shown below:
Initial condition Goal comment
1. (four, three) if four < 4 (4, three) fill four from tap
2. (four, three) if three< 3 (four, 3) fill three from tap
3. (four, three) If four > 0 (0, three) empty four into drain
4. (four, three) if three > 0 (four, 0) empty three into drain
5. (four, three) if four + three<4 (four + three, 0) empty three into four
6. (four, three) if four + three<3 (0, four + three) empty four into three
7. (0, three) If three > 0 (three, 0) empty three into four
8. (four, 0) if four > 0 (0, four) empty four into three
9. (0, 2) (2, 0) empty three into four
10. (2, 0) (0, 2) empty four into three
11. (four, three) if four < 4 (4, three-diff) pour diff, 4-four, into four from three
12. (three, four) if three < 3 (four-diff, 3) pour diff, 3-three, into three from four
and a solution is given below four three rule
Fig: Production Rules for the Water Jug Problem
The problem solved by using the production rules in combination with an appropriate
control strategy, moving through the problem space until a path from an initial state to a goal state
is found. In this problem solving process, search is the fundamental concept. For simple problems
it is easier to achieve this goal by hand but there will be cases where this is far too difficult.
PRODUCTION SYSTEMS
Production systems provide search structures that forms the core of many intelligent
processes. Hence it is useful to structure AI programs in a way that facilitates describing and
performing the search process. Do not be confused by other uses of the word production, such as
to describe what is done in factories. A production system consists of:
1. A set of rules, each consisting of a left side and a right hand side. Left hand side or pattern
determines the applicability of the rule and a right side describes the operation to be performed if
the rule is applied.
2. One or more knowledge/databases that contain whatever information is appropriate for the
particular task. Some parts of the database may be permanent, while other parts of it may pertain
only to the solution of the current problem. The information in these databases may be structured
in any appropriate way.
3. A control strategy that specifies the order in which the rules will be compared to the database
and a way of resolving the conflicts that arise when several rules match at once.
4. A rule applier.
Example: Eight puzzle (8-Puzzle)

The 8-puzzle is a 3 × 3 array containing eight square pieces, numbered 1 through 8, and one
empty space. A piece can be moved horizontally or vertically into the empty space, in effect
exchanging the positions of the piece and the empty space. There are four possible moves, UP
(move the blank space up), DOWN, LEFT and RIGHT. The aim of the game is to make a
sequence of moves that will convert the board from the start state into the goal state:
This example can be solved by the operator sequence UP, RIGHT, UP, LEFT, DOWN.
Control Strategies
 The first requirement of a good control strategy is that it causes motion

 The second requirement of a good control strategy is that it be systematic
Example : BFS and DFS
Fig: Breadth First Search Tree
Fig: Depth – First Search Tree

BFS (Breadth-First Search) Algorithm:
1. Set L to be a list of the initial nodes in the problem.
2. If L is empty, fail otherwise pick the first node n from L
3. If n is a goal state, quit and return path from initial node.
4. Otherwise remove n from L and add to the end of L all of n's children. Label each child
with its path from initial node. Return to 2.
DFS (Depth-First Search) Algorithm:
1. Set L to be a list of the initial nodes in the problem.
2. If L is empty, fail otherwise pick the first node n from L
3. If n is a goal state, quit and return path from initial node.
4. Otherwise remove n from L and add to the front of L all of n's children. Label each child
with its path from initial node. Return to 2.
Example: Traveling Salesman Problem:

Given n cities with known distances between each pair, find the shortest tour that passes
through all the cities exactly once before returning to the starting city.
 A lower bound on the length l of any tour can be computed as follows
 For each city i, 1 ≤ i ≤ n, find the sum si of the distances from city i to the two nearest
cities.
 Compute the sum s of these n numbers.
 Divide the result by 2 and round up the result to the nearest integer lb = s / 2
 The lower bound for the graph shown in the Fig 5.1 can be computed as follows:
lb = [(1 + 3) + (3 + 6) + (1 + 2) + (3 + 4) + (2 + 3)] / 2 = 14.
Heuristic Searches:
A heuristic is a method that
 might not always find the best solution
 but is guaranteed to find a good solution in reasonable time.
 By sacrificing completeness it increases efficiency.
 Useful in solving tough problems which
 could not be solved any other way.
 solutions take an infinite time or very long time to compute.
The classic example of heuristic search methods is the travelling salesman problem.
PROBLEM CHARACTERISTICS
A problem may have different aspects of representation and explanation. In order to
choose the most appropriate method for a particular problem, it is necessary to analyze the
problem along several key dimensions.
Heuristic search is a very general method applicable to a large class of problem. It
includes a variety of techniques. In order to choose an appropriate method, it is necessary to
analyze the problem with respect to the following considerations.
To use the heuristic search for problem solving, we suggest analysis of the problem for the
following considerations:
 Decomposability of the problem into a set of independent smaller sub problems
 Possibility of undoing solution steps, if they are found to be unwise
 Predictability of the problem universe
 Possibility of obtaining an obvious solution to a problem without comparison of all other
possible solutions
 Type of the solution: whether it is a state or a path to the goal state
 Role of knowledge in problem solving
 Nature of solution process: with or without interacting with the user
Problem Decomposition
Suppose to solve the expression is:
This problem can be solved by breaking it into smaller problems, each of which we can
solve by using a small collection of specific rules. Using this technique of problem
decomposition, we can solve very large problems very easily. This can be considered as an
intelligent behaviour.
Can Solution Steps be Ignored or Undone?

Suppose we are trying to prove a mathematical theorem: first we proceed considering that
proving a lemma will be useful. Later we realize that it is not at all useful. We start with another
one to prove the theorem. Here we simply ignore the first method.
Consider the 8-puzzle problem to solve: we make a wrong move and realize that mistake.
But here, the control strategy must keep track of all the moves, so that we can backtrack to the
initial state and start with some new move.
Consider the problem of playing chess. Here, once we make a move we never recover
from that step. These problems are illustrated in the three important classes of problems
mentioned below:
1. Ignorable, in which solution steps can be ignored. Eg: Theorem Proving
2. Recoverable, in which solution steps can be undone. Eg: 8-Puzzle
3. Irrecoverable, in which solution steps cannot be undone. Eg: Chess
Is the Problem Universe Predictable?

Consider the 8-Puzzle problem. Every time we make a move, we know exactly what will
happen. This means that it is possible to plan an entire sequence of moves and be confident what
the resulting state will be. We can backtrack to earlier moves if they prove unwise.
Suppose we want to play Bridge. We need to plan before the first play, but we cannot play
with certainty. So, the outcome of this game is very uncertain. In case of 8-Puzzle, the outcome is
very certain.
To solve uncertain outcome problems, we follow the process of plan revision as the plan is
carried out and the necessary feedback is provided. The disadvantage is that the planning in this
case is often very expensive.
Is Good Solution Absolute or Relative?

Consider the problem of answering questions based on a database of simple facts such as
the following:
1. Siva was a man.
2. Siva was a worker in a company.
3. Siva was born in 1905.
4. All men are mortal.
5. All workers in a factory died when there was an accident in 1952.
6. No mortal lives longer than 100 years.
Suppose we ask a question: ‘Is Siva alive?’
By representing these facts in a formal language, such as predicate logic, and then using
formal inference methods we can derive an answer to this question easily.
There are two ways to answer the question shown below:
Method I:
1. Siva was a man.
2. Siva was born in 1905.
3. All men are mortal.
4. Now it is 2008, so Siva’s age is 103 years.
5. No mortal lives longer than 100 years.
Method II:
1. Siva is a worker in the company.
2. All workers in the company died in 1952.
Answer: So Siva is not alive. It is the answer from the above methods.
We are interested to answer the question; it does not matter which path we follow. If we
follow one path successfully to the correct answer, then there is no reason to go back and check
another path to lead the solution.
Is the solution a State or a Path?

Consider the problem of consistent interpretation for the sentence (Natural Language
Processing)
The bank president ate a dish of pasta salad with the fork
Each component of the above sentence may have more than one interpretation. Some of
the sources of ambiguity in this sentence are the following
1. Bank may refer to a financial institutions or a side of a river.
2. Pasta salad is a salad containing pasta. Ex. Dog food does not contain Dog.
We need to produce only the interpretation itself. No record of the processing by which
the interpretation was found is necessary.
Consider the water jug problem; the statement of solution to this problem must be a
sequence of operations that produce the final state.
What is the Role of Knowledge?
Consider the problem of chess playing,

How much knowledge would be required by a perfect program?
 Rules for determining legal moves and simple control mechanism
 Additionally, Good strategy and tactics
Another example , Newspaper Story understanding. These two different problems need lot of
knowledge is required to be able to recognize solution.
Does the Task Require Interaction with a person?

Solitary, in which the computer is given a problem description and produces an answer
with no intermediate communication and with no demand for an explanation of the reasoning
process.
Conversational, in which there is intermediate communication between person and
computer, either to provide additional assistance to the computer or to provide additional
information to the user, or both
Problem Classification
Most diagnostic tasks, including medical diagnosis as well as diagnosis of fault in
mechanical devices are the examples of classification. Another example is Propose and refine
(Design and planning problems)
PRODUCTION SYSTEM CHARACTERISTICS

The production system can be classified as monotonic, non-monotonic, partially commutative
and commutative.
Fig: Architecture of Production System

Monotonic Production System (MPS):
The Monotonic production system (MPS) is a system in which the application of a rule
never prevents later application of another rule that could also have been applied at the time that
the first rule was selected.
Non-monotonic Production (NMPS):
The non-monotonic production system is a system in which the application of a rule
prevents the later application of the another rule which may not have been applied at the time that
the first rule was selected, i.e. it is a system in which the above rule is not true, i.e. the monotonic
production system rule not true.
Commutative Law Based Production System (CBPS):
Commutative law based production systems is a system in which it satisfies both
monotonic & partially commutative.
Partially Commutative Production System (PCPS):
The partially commutative production system is a system with the property that if the
application of those rules that is allowable & also transforms from state x to state ‘y’.
Not partially Commutative
Problems in which irreversible change occurs. When dealing with ones that describe
irreversible processes, it is partially important to make correct decisions the first time, although if
the universe is predictable, planning can be used to make that less important.
Monotonic (Characteristics) Non-monotonic
Partially commutative Theorem proving Robot navigation
Non-partial commutative Chemical synthesis Bridge game
SPECIALIZED PRODUCTION SYSTEMS:

Missionaries and Carnivals Problem Definition:
 In Missionaries and Carnivals Problem, initially there are some missionaries and
some carnivals will be at a side of a river. They want to cross the river. But there is
only one boat available to cross the river. The capacity of the boat is 2 and no one
missionary or no Carnivals can cross the river together.
 So for solving the problem and to find out the solution on different states is called the
Missionaries and Carnival Problem.
Procedure:
 Let us take an example. Initially a boatman, Grass, Tiger and Goat is present at the
left bank of the river and want to cross it. The only boat available is one capable of
carrying 2 objects of portions at a time.
 The condition of safe crossing is that at no time the tiger present with goat, the goat
present with the grass at the either side of the river. How they will cross the river?
 The objective of the solution is to find the sequence of their transfer from one bank of
the river to the other using the boat sailing through the river satisfying these
constraints.
Let us use different representations for each of the missionaries and Carnivals as follows.
 B: Boat
 T: Tiger
 G: Goat
 Gr:Grass
Step 1:
According to the question, this step will be (B, T, G, and Gr) as all the Missionaries and the
Carnivals are at one side of the bank of the river. Different states from this state can be
implemented as
The states (B, T, O, O) and (B, O, O, Gr) will not be countable because at a time the
Boatman and the Tiger or the Boatman and grass cannot go. (According to the question)
Step 2:
 Now consider the current state of step-1 i.e. the state (B, O, G, and O).
The state is the right side of the river.
 So on the left side the state may be (B, T, O,
Gr) i.e._B,O,G,O_ _ _B, T, O, Gr_
(Right) (Left)
Step 3:
Now proceed according to the left and right sides of the river such that the condition of the
problem must be satisfied.
Step 4:
First, consider the first current state on the right side of step 3 i.e.
Now consider the second current state on the right side of step-3 i.e.
Step 5:
Now first consider the first current state of step 4 i.e.
Now first consider the first current state of step 4 i.e.
Step 6:
Step 7:
Hence the final state will be (B, T, G, Gr) which are on the right side of the river. Comments:
 This problem requires a lot of space for its state implementation.
 It takes a lot of time to search the goal node.
 The production rules at each level of state are very strict.
PROBLEM SOLVING METHODS

 Graph
 Matching
 Indexing
PROBLEM GRAPH:
Formulating Problem as a Graph:
In the graph

each node represents a possible state;

a node is designated as the initial state;

One or more nodes represent goal states, states in which the agent’s goal is considered
accomplished. each edge represents a state transition caused by a specific agent action;
 Associated to each edge is the cost of performing that transition.
Example: The 8-puzzle
Fig: Graph for 8-puzzle

Example 2: travelling salesperson problem
Fig: An instance of the travelling salesperson problem
Fig: travelling sales person problem
MATCHING & INDEXING

 The knowledge is stored in knowledge base in form of rules and to solve a particular problem
appropriate rule need to be applied. These rules applied to individual problem state generate new
states.
 The new rules are applied to new states to further generate new states.
 To find which rules are applicable, the current state of the problem and its preconditions are
matched with left hand side of the rule. This is done using matching.
 One way to select applicable rule is simply searching through all rules. In the process of matching
the left hand side of rule is compared with the input and all rules that match are extracted.
The difficulties of these simple search techniques are:
 In big problems large numbers of rules are used. Scanning through all of these rules at every
step of the search would be hopelessly inefficient.
 It is not clearly visible to find out which condition will be satisfied. Some of the matching
techniques are described below:
1. Indexing:
To overcome above problems indexing is used. In this instead of searching all the rules the
current state is used as index in to the rules, and selects the matching rules immediately.
Example: Chess Game
Fig: One Legal Chess Move
 Consider the chess game playing. Here the set of valid moves is very large. To reduce the size
of this set only useful moves are identified.
 At the time of playing the game, the next move will very much depend upon the current
move. As the game is going on, there will be only „few‟ moves which are applicable in next
move.
 Hence it will be a wasteful effort to check the applicability of all moves. Rather the important
and valid legal moves are directly stored as rules and through indexing the applicable rules
are found. Here the indexing will store the current board position.
 The indexing makes the matching process easy, at the cost of lack of generality in the
statement rules.
 Practically there is a trade-off between the ease of writing rules and simplicity of matching
process.
 The indexing techniques are not very well suited for the rule base where rules are written in
high level predicated.
 In PROLOG and many theorem proving systems, rules are indexed by predicates they
contain. Hence all the applicable rules can be indexed quickly.
2. Matching with variable:
 In the rule base if the preconditions are not stated as exact descriptions of particular situation,
the indexing technique does not work well.
 In certain situations they describe properties that the situation must have. In the situations
where single condition is matched against a single element in state description.
 The unification procedure can be used. However in practical situation it is required to match
complete set of rules that match the current state.
 In forward and backward chaining system, the depth first search technique is used to select
the individual rule. In the situations where multiple rules are applicable, conflict resolution
technique is used to choose appropriate rule.
 In case of the situations requiring multiple matches, the unification can be applied
recursively, but a more efficient method is to use RETE matching algorithm.
3. Complex matching variable:
 A more complex matching process is required when preconditions of a rule specify required
properties that are not stated explicitly in the description of current state.
 However the real world is full of uncertainties and sometimes practically it is not possible to
define the rule in exact fashion.
 The matching process becomes more complicated in the situation where preconditions
approximately match the current situations
E.g a speech understanding program must contain the rules that map from a description of a
physical wave form to phones.

Because of the presence of noise the signal becomes so variable that there will be
approximate match between the rules that describe an ideal sound and the input that describes
that ideal world.
 Approximate matching is particularly difficult to deal with, because as we increase the
tolerance allowed in the match the new rules need to be written and it will increase number of
rules.
 It will increase the size of main search process.
 But approximate matching is nevertheless superior to exact matching in situations such as
speech understanding, where exact matching may result in no rule being matched and search
process coming to grinding halt.
SEARCH STRATEGIES
What is Search?
 Search is the systematic examination of states to find path from the start/root state to the
goal state.
 The set of possible states, together with operators defining their connectivity constitute the
search space.
 The output of a search algorithm is a solution, that is, a path from the initial state to a state
that satisfies the goal test.
Search Tree
Having formulated some problems, we now need to solve them. This is done by a search through
the state space. A search tree is generated by the initial state and the successor function that together define
the state space. In general, we may have a search graph rather than a search tree, when the same state can
be reached from multiple paths.
Types of Search
There are three broad classes of search processes:
1) Uninformed- Blind Search
 There is no specific reason to prefer one part of the search space to any other, in
finding a path from initial state to goal state.
 Systematic, exhaustive Search
• Depth-first-search
• Breadth-first-search
2) Informed – Heuristic search - there is specific information to focus the search.
 Hill climbing
 Branch and bound
 Best first
 A*
3) Game playing – there are at least two partners opposing to each other.
 Minimax (a, b pruning)
 Means ends analysis
UNINFORMED SEARCH STRATEGIES
 Uninformed Search Strategies have no additional information about states beyond that
provided in the problem definition.
 Strategies that know whether one non goal state is “more promising” than another are called
informed search or heuristic search strategies.
 There are five uninformed search strategies as given below.
 Breadth-first search
 Uniform-cost search
 Depth-first search
 Depth-limited search
 Iterative deepening search
BREADTH-FIRST SEARCH
 Breadth-first search is a simple strategy in which the root node is expanded first, then all successors
of the root node are expanded next, then their successors, and so on. In general, all the nodes are
expanded at a given depth in the search tree before any nodes at the next level are expanded.
 Breath-first-search is implemented by calling TREE-SEARCH with an empty fringe that is a first-in-
first-out (FIFO) queue, assuring that the nodes that are visited first will be expanded first.
 In other Words, calling TREE-SEARCH (problem, FIFO-QUEUE ()) results in breadth-first-search.
 The FIFO queue puts all newly generated successors at the end of the queue, which means that
Shallow nodes are expanded before deeper nodes.
Fig: Breadth-first searches on a simple binary tree. At each stage, the node to be expanded next
indicated by a marker.
Properties of breadth-first-search
Fig: Breadth first search properties

Fig: Time and memory requirements for breadth -first-search. The numbers shown
assume branch factor of b = 10 ; 10,000 nodes/second; 1000 bytes/node
Time complexity for BFS
Assume every state has b successors. The root of the search tree generates b nodes at the first level,
each of which generates b more nodes, for a total of b2 at the second level. Each of these generates b more
nodes, yielding b3 nodes at the third level, and so on. Now suppose that the solution is at depth d. In the
worst case, we would expand all but the last node at level d, generating b d+1 - b nodes at level d+1. Then
the total number of nodes generated is
b + b2 + b3 + …+ bd+ ( bd+1 + b) = O(bd+1).
Every node that is generated must remain in memory, because it is either part of the fringe or is an
ancestor of a fringe node. The space complexity is, therefore, the same as the time complexity
Uniform-Cost Search
 Instead of expanding the shallowest node, uniform-cost search expands the node n with the
lowest path cost.
 Uniform-cost search does not care about the number of steps a path has, but only about their
total cost.

Fig: Properties of Uniform-cost-search

Depth-First-Search
 Depth-first-search always expands the deepest node in the current fringe of the search tree. The
progress of the search is illustrated in figure 1.19. The search proceeds immediately to the deepest
level of the search tree, where the nodes have no successors.
 As those nodes are expanded, they are dropped from the fringe, so then the search “backs up” to
the next shallowest node that still has unexplored successors.
Fig: Depth-first-search on a binary tree.
 Nodes that have been expanded and have no descendants in the fringe can be removed
from the memory; these are shown in black. Nodes at depth 3 are assumed to have no
successors and M is the only goal node.
 This strategy can be implemented by TREE-SEARCH with a last-in-first-out (LIFO)
queue, also known as a stack.
 Depth-first-search has very modest memory requirements. It needs to store only a single
path from the root to a leaf node, along with the remaining unexpanded sibling nodes for
each node on the path. Once the node has been expanded, it can be removed from the
memory, as soon as its descendants have been fully explored (Refer Figure 1.13).
 For a state space with a branching factor b and maximum depth m,depth-first- search
requires storage of only bm + 1 nodes.
Drawback of Depth-first-search
 The drawback of depth-first-search is that it can make a wrong choice and get stuck going
down very long(or even infinite) path when a different choice would lead to solution near
the root of the search tree.
 For example, depth-first-search will explore the entire left sub tree even if node C is a goal
node.
HEURISTIC SEARCH TECHNIQUES
 The heuristic function is a way to inform the search about the direction to a goal. Itprovides an
informed way to guess which neighbor of a node will lead to a goal.
 There is nothing magical about a heuristic function. It must use only information that can be
readily obtained about a node. Typically a trade-off exists between the amount of work it takes to
derive a heuristic value for a node and how accurately the heuristic value of a node measures the
actual path cost from the node to a goal.
 A heuristic function, h(n), provides an estimate of the cost of the path from a given node to the
closest goal state. Must be zero if node represents a goal state.
 Example: Straight-line distance from current location to the goal location in a road navigation
problem.
 A standard way to derive a heuristic function is to solve a simpler problem and to use the actual
cost in the simplified problem as the heuristic function of the original problem.
Heuristic Search
 Direct techniques blind search) are not always possible (they require too much time or memory).
 Weak techniques can be effective if applied correctly on the right kinds of tasks.
 Typically require domain specific information.
Generate and Test Strategy Generate-And-Test Algorithm
 Generate-and-test search algorithm is a very simple algorithm that guarantees to find
a solution if
 Done systematically and there exist a solution.
Algorithm: Generate-And-Test
1. Generate a possible solution.

2. Test to see if this is the expected solution.
3. If the solution has been found quit else go to step 1.
 Potential solutions that need to be generated vary depending on the kinds of problems. For
some problems the possible solutions may be particular points in the problem space and for
some problems, paths from the start state.
 Generate-and-test, like depth-first search, requires that complete solutions be generated for
testing. In its most systematic form, it is only an exhaustive search of the problem space.
 Solutions can also be generated randomly but solution is not guaranteed. This approach is
what is known as British Museum algorithm: finding an object in the British Museum by
wandering randomly.
Fig: Generate and Test

Hill Climbing
 Hill climbing is the optimization technique which belongs to a family of local search. It is
relatively simple to implement, making it a popular first choice. Although more advanced
algorithms may give better results in some situations hill climbing works well.
 Hill climbing can be used to solve problems that have many solutions, some of which are better
than others. It starts with a random (potentially poor) solution, and iteratively makes small changes
to the solution, each time improving it a little. When the algorithm cannot see any improvement
anymore, it terminates. Ideally, at that point the current solution is close to optimal, but it is not
guaranteed that hill climbing will ever come close to the optimal solution.
 For example, hill climbing can be applied to the traveling salesman problem. It is easy to find a
solution that visits all the cities but is be very poor compared to the optimal solution. The
algorithm starts with such a solution and makes small improvements to it, such as switching the
order in which two cities are visited. Eventually, a much better route is obtained.
 Hill climbing is used widely in artificial intelligence, for reaching a goal state from a starting node.
Choice of next node and starting node can be varied to give a list of related algorithms.
 Hill climbing attempts to maximize (or minimize) a function f (x), where x are discrete states.
These states are typically represented by vertices in a graph, where edges in the graph encode
nearness or similarity of a graph. Hill climbing will follow the graph from vertex to vertex, always
locally increasing (or decreasing) the value of f, until a local maximum (or local minimum) xm is
reached. Hill climbing can also operate on a continuous space: in that case, the algorithm is called
gradient ascent (or gradient descent if the function is minimized).*.
 Problems with hill climbing: local maxima (we've climbed to the top of the hill, and missed the
mountain), plateau (everything around is about as good as where we are), ridges (we're on a ridge
leading up, but we can't directly apply an operator to improve our situation, so we have to apply
more than one operator to get there).
 Solutions include: backtracking, making big jumps (to handle plateaus or poor local maxima),
applying multiple rules before testing (helps with ridges).Hill climbing is best suited to problems
where the heuristic gradually improves the closer it gets to the solution; it works poorly where
there are sharp drop-offs. It assumes that local improvement will lead to global improvement.
 Search methods based on hill climbing get their names from the way the nodes are selected for
expansion. At each point in the search path a successor node that appears to lead most quickly to
the top of the hill (goal) selected for exploration. This method requires that some information be
available with which to evaluate and order the most promising choices. Hill climbing is like depth
first searching where the most promising child is selected for expansion.
 Hill climbing is a variant of generate and test in which feedback from the test procedure is used to
help the generator decide which direction to move in the search space. Hill climbing is often used
when a good heuristic function is available for evaluating states but when no other useful
knowledge is available.
 For example, suppose you are in an unfamiliar city without a map and you want to get downtown.
You simply aim for the tall buildings. The heuristic function is just distance between the current
location and the location of the tall buildings and the desirable states are those in which this
distance is minimized.
Simple Hill Climbing
The simplest way to implement hill climbing is the simple hill climbing whose
algorithm is as given below:
Algorithm: Simple Hill Climbing:
Step 1: Evaluate the initial state. It it is also a goal state, then return it and quit. Otherwise continue
with the initial state as the current state.
Step 2: Loop until a solution is found or until there are no new operators left to be applied in the
current state:
(a) Select an operator that has not yet been applied to the current state and apply it to produce a
new state.
(b) Evaluate the new state.
 If it is a goal state, then return it and quit.
 If it is not a goal state, but it is better than the current state, then make it the
current state.
 If it is not better than the current state, then continue in the loop.
The key difference between this algorithm and the one we gave for generate and test is the use of
an evaluation function as a way to inject task specific knowledge into the control process. It is the
use of such knowledge that makes this heuristic search method. It is the same knowledge that
gives these methods their power to solve some otherwise intractable problems
To see how hill climbing works, let„s take the puzzle of the four colored blocks. To solve the
problem we first need to define a heuristic function that describes how close a particular
configuration is to being a solution.
One such function is simply the sum of the number of different colors on each of the four sides. A
solution to the puzzle will have a value of 16. Next we need to define a set of rules that describe
ways of transforming one configuration to another.
Actually one rule will suffice, It says simply pick a block and rotate it 90 degrees in any direction.
Having provided these definitions the next step is to generate a starting configuration.
This can either be done at random or with the aid of the heuristic function. Now by using hill
climbing, first we generate a new state by selecting a block and rotating it. If the resulting state is
better than we keep it. If not it returns to the previous state and try a different perturbation.
Steepest – Ascent Hill Climbing:
A useful variation on simple hill climbing considers all the moves form the current state and
selects the best one as the next state. This method is called steepest – ascent hill climbing or
gradient search.
Steepest Ascent hill climbing contrasts with the basic method, in which the first state that is better
than the current state is selected. The algorithm works as follows.
Algorithm: Steepest – Ascent Hill Climbing
Step 1: Evaluate the initial state. If it is also a goal state, then return it and quit .Otherwise, continue
with the initial state as the current state.
Step 2: Loop until a solution is found or until a complete iteration produces no change to current
state:
(a) Let SUCC be a state such that any possible successor of the current state will be better than
SUCC.
(b) For each operator that applies to the current state do:
 Apply the operator and generate a new state.
 Evaluate the new state. If it is a goal state, then return it and quit. If not, compare it to
SUCC. If it is better then set SUCC to this state. If it is not better, leave SUCC alone.
(c) If the SUCC is better than current state, then set current state to SUCC.
To apply steepest- ascent hill climbing to the colored blocks problem, we must consider all
perturbations of the initial state and choose the best. For this problem this is difficult since there
are so many possible moves.
There is a trade-off between the time required to select a move and the number of moves required
to get a solution that must be considered when deciding which method will work better for a
particular problem. Usually the time required to select a move is longer for steepest – ascent hill
climbing and the number of moves required to get to a solution is longer for basic hill climbing.
Both basic and steepest ascent hill climbing may fail to find a solution. Either algorithm may terminate not
by finding a goal state but by getting to a state from which no better states can be generated. This will
happen if the program has reached a local maximum, a plateau, or a ridge.
Fig: State Space diagram for Hill Climbing
Different regions in the State Space Diagram

Local maximum: It is a state which is better than its neighbouring state however there exists a
state which is better than it(global maximum). This state is better because here value of objective
function is higher than its neighbors.
Global maximum: It is the best possible state in the state space diagram. This is because at this
state, objective function has highest value.
Plateau/flat local maximum: It is a flat region of state space where neighbouring states have the
same value.
Ridge: It is region which is higher than its neighbours but itself has a slope. It is a special kind of
local maximum.
Current state: The region of state space diagram where we are currently present during the
search.
Shoulder: It is a plateau that has an uphill edge.
Hill climbing Disadvantages:
Local Maximum: A local maximum is a state that is better than all its neighbors but is not better than
some other states farther away.
Fig: Local Maximum Fig: Plateau Fig: Ridge
Plateau: A Plateau is a flat area of the search space in which a whole set of neighbouring states has
the same value.
Ridge: A ridge is a special kind of local maximum. It is an area of the search space that is higher than
surrounding areas and that itself has a slope.
Ways out
• Backtrack to some earlier node and try going in a different direction.
• Make a big jump to try to get in a new section.
• Moves in several directions at once.
Hill climbing is a local method: Decides what to do next by looking only at the immediate
consequence of its choice.
Global information might be encoded in heuristic functions.
Hill Climbing: Blocks World
Local heuristic:
+1-For each block that is resting on the thing it is supposed to be resting on.
−1 for each block that is rested on wrong block
Global heuristic:
For each block that has the correct support structure:
+1 to every block in the support structure. For each block that has a wrong support
structure:
−1to every block in the support structure.
Simulated Annealing
 The problem of local maxima has been overcome in simulated annealing search. In normal hill
climbing search, the movements towards downhill are never made. In such algorithms the search
may stuck up to local maximum. Thus this search cannot guarantee complete solutions.
 In contrast, a random search ( or movement) towards successor chosen randomly from the set of
successor will be complete, but it will be extremely inefficient. The combination of hill climbing
and random search, which yields both efficiency and completeness, is called simulated annealing.
 The simulated annealing method was originally developed for the physical process of annealing.
That is how the name simulated annealing was found and restored.
 In simulated annealing searching algorithm, instead of picking the best move, a random move is
picked. The standard simulated annealing uses term objective function instead of heuristic
function. If the move improves the situation it is accepted otherwise the algorithm accepts the
move with some probability less than
This probability is
P= e-∆E/kT
Where - E is positive charge in energy level, t is temperature and k is Boltzman constant. As indicated
by the equation the probability decreases with badness of the move (evaluation gets worsened by
amount - E). The rate at which - E is cooled is called annealing schedule. The proper annealing
schedule is maintained to monitor T.
This process has following differences from hill climbing search:
• The annealing schedule is maintained.
• Moves to worse states are also accepted.
• In addition to current state, the best state record is maintained. The algorithm of simulated
annealing is presented as follows:
Algorithm: ―simulated annealing
1. Evaluate the initial state. Mark it as current state. Till the current state is not a goal state, initialize
best state to current state. If the initial state is the best state, return it and quit.
2. Initialize T according to annealing schedule.
3. Repeat the following until a solution is obtained or operators are not left:
a. Apply yet unapplied operator to produce a new state
b. For new state compute - E= value of current state – value of new state. If the new state is the
goal state then stop, or if it is better than current state, make it as current state and record as
best state.
c. If it is not better than the current state, then make it current state with probability P.
d. Revise T according to annealing schedule
4. Return best state as answer.
Best-First Search:
It is a way of combining the advantages of both depth-first and breadth-first search into a
single method.
1. OR Graphs
 Depth-first search is good because it allows a solution to be found without all competing
branches having to be expanded. Breadth-first search is good because it does not get trapped
on dead-end paths.
 One way of combining the two is to follow a single path at a time, but switch paths whenever
some competing path looks more promising than the current one does.
 At each step of the best-first search process, we select the most promising of the nodes we
have generated so far. This is done by applying an appropriate heuristic function to each of
them. We then expand the chosen node by using the rules to generate its successors.
 But eventually, if a solution is not found, that branch will start to look less promising than
one of the top-level branches that had been ignored.
 At that point, the now more promising, previously ignored branch will be explored.
 But the old branch is not forgotten. Its last node remains in the set of generated but
unexpanded nodes.
 Since node D is the most promising, it is expanded next, producing two successor nodes, E
and F. But then the heuristic function is applied to them.
 Now another path, that going through node B, looks more promising, so it is pursued,
generating nodes G and H.
 But again when these new nodes are evaluated they look less promising than another path, so
attention is returned to the path through D to E. E is then expanded, yielding nodes I and J.
 At the next step, J will be expanded, since it is the most promising. This process can continue
until a solution is found Fig (below)
It is a general heuristic based search technique. In best first search, in the graph of problem
representation, one evaluation function (which corresponds to heuristic function) is attached with
every node. The value of evaluation function may depend upon cost or distance of current node from
goal node. The decision of which node to be expanded depends on the value of this evaluation
function. The best first can understood from following tree. In the tree, the attached value with nodes
indicates utility value. The expansion of nodes according to best first search is illustrated in the
following figure.
Fig: Tree getting expansion according to best first search
 Here, at any step, the most promising node having least value of utility function is chosen for
expansion.
 In the tree shown above, best first search technique is applied; however it is beneficial
sometimes to search a graph instead of tree to avoid the searching of duplicate paths. In the
process to do so, searching is done in a directed graph in which each node represents a point
in the problem space. This graph is known as OR-graph. Each of the branches of an OR
graph represents an alternative problem solving path.
Two lists of nodes are used to implement a graph search procedure discussed above. These are
1. OPEN: these are the nodes that have been generated and have had the heuristic function
applied to them but not have been examined yet.
2. CLOSED: these are the nodes that have already been examined. These nodes are kept in
the memory if we want to search a graph rather than a tree because whenever a node will
be generated, we will have to check whether it has been generated earlier.
 The best first search is a way of combining the advantage of both depth first and breath first
search. The depth first search is good because it allows a solution to be found without all
competing branches have to be expanded.
 Breadth first search is good because it does not get trapped on dead ends of path. The way of
combining this is to follow a single path at a time but switches between paths whenever some
competing paths looks more promising than current one does.
 Hence at each step of best first search process, we select most promising node out of
successor nodes that have been generated so far.
The functioning of best first search is summarized in the following steps:

1. It maintains a list open containing just the initial state.
2. Until a goal is found or there are no nodes left in open list do:
a. Pick the best node from open,
b. Generate its successor, and for each successor:
1. Check, and if it has not been generated before evaluate it and add it to open andrecord its
parent.
2. If it has been generated before, and new path is better than the previous parent
thenchange the parent.
The algorithm for best first search is given as follows:
Algorithm: Best first search
1. Put the initial node on the list say ‗OPEN„.

2. If (OPEN = empty or OPEN= goal) terminate search, else
3. Remove the first node from open (say node is a)
4. If (a=goal) terminate search with success else
5. Generate all the successor node of ‗a„. Send node ‗a „to a list called ‗CLOSED
„and Find out the value of heuristic function of all nodes. Sort all children
generated so far on the basis of their utility value. Select the node of minimum
heuristic value for further expansion.
6. Go back to step 2.
7. The best first search can be implemented using priority queue. There are
variations of best first search. Example of these are greedy best first search, A*
and recursive best first search.
The A* Algorithm:
 The A* algorithm is a specialization of best first search. It most widely known form of best first
search. It provides genera guidelines about how to estimate goal distance for general search
graph.
 At each node along a path to the goal node, the A* algorithm generate all successor nodes and
computes an estimate of distance (cost) from the start node to a goal node through each of the
successors.
 If then chooses the successor with shortest estimated distance from expansion. It calculates the
heuristic function based on distance of current node from the start node and distance of current
node to goal node.
The form of heuristic estimation function for A* is defined as follows:
f(n)=g(n)+h(n)
Where f(n)= evaluation function
g (n)= cost (or distance) of current node from start node. h (n)= cost of current node from goal
node.
 In A* algorithm the most promising node is chosen from expansion. The promising node is
decided based on the value of heuristic function.
 Normally the node having lowest value of f (n) is chosen for expansion. We must note that the
goodness of a move depends upon the nature of problem, in some problems the node having least
value of heuristic function would be most promising node, where in some situation, the node
having maximum value of heuristic function is chosen for expansion.
 A* algorithm maintains two lists. One store the list of open nodes and other maintain the list of
already expanded nodes. A* algorithm is an example of optimal search algorithm.
 A search algorithm is optimal if it has admissible heuristic. An algorithm has admissible heuristic
if its heuristic function h(n) never overestimates the cost to reach the goal. Admissible heuristic
are always optimistic because in them, the cost of solving the problem is less than what actually
is. The A* algorithm works as follows:
A* algorithm:
1. Place the starting node ‗s„on ‗OPEN„ list.
2. If OPEN is empty, stop and return failure.
3. Remove from OPEN the node ‗n„that has the smallest value of f*(n). if node ‗n is a goal
node, return success and stop otherwise.
4. Expand ‗n„ generating all of its successors ‗n„ and place ‗n„ on CLOSED. For every
successor ‗n„if ‗n„is not already OPEN , attach a back pointer to ‗n„. Compute f*(n) and
place it on CLOSED.
5. Each ‗n„ that is already on OPEN or CLOSED should be attached to back pointers which
reflect the lowest f*(n) path. If ‗n„ was on CLOSED and its pointer was changed, remove it
and place it on OPEN.
6. Return to step 2.
Problem Reduction
Problem Reduction with AO* Algorithm
• When a problem can be divided into a set of sub problems, where each sub problem can be
solved separately and a combination of these will be a solution, AND-OR graphs or AND –
OR trees are used for representing the solution.
• The decomposition of the problem or problem reduction generates AND arcs. One AND may
point to any number of successor nodes. All these must be solved so that the arc will rise to
many arcs, indicating several possible solutions.
• Hence the graph is known as AND - OR instead of AND. Figure shows an AND - OR graph.
Fig: Example of AND- OR Graph

An algorithm to find a solution in an AND -OR graph must handle AND area appropriately. A*
algorithm cannot search AND -OR graphs efficiently. This can be understood from the given figure.
Fig: AND-OR Graph

In figure (a) the top node A has been expanded producing two area one leading to B and
leading to C-D. The numbers at each node represent the value of f ' at that node (cost of
getting to the goal state from current state). For simplicity, it is assumed that every operation
(i.e. applying a rule) has unit cost, i.e., each are with single successor will have a cost of 1
and each of its components. With the available information till now, it appears that C is the
most promising node to expand since its f ' = 3, the lowest but going through B would be
better since to use C we must also use D' and the cost would be 9(3+4+1+1). Through B it
would be 6(5+1). Thus the choice of the next node to expand depends not only n a value but
also on whether that node is part of the current best path form the initial mode. Figure (b)
makes this clearer. In figure the node G appears to be the most promising node, with the least
f ' value. But G is not on the current beat path, since to use G we must use GH with a cost of
9 and again this demands that arcs be used (with a cost of 27). The path from A through B,
E-F is better with a total cost of (17+1=18). Thus we can see that to search an AND-OR
graph, the following three things must be done.
1. Traverse the graph starting at the initial node and following the current best path, and
accumulate the set of nodes that are on the path and have not yet been expanded.
2. Pick one of these unexpanded nodes and expand it. Add its successors to the graph
and computer f ' (cost of the remaining distance) for each of them.
3. Change the f ' estimate of the newly expanded node to reflect the new information
produced by its successors. Propagate this change backward through the graph.
Decide which of the current best path.
The propagation of revised cost estimation backward is in the tree is not necessary in A*
algorithm. This is because in AO* algorithm expanded nodes are re-examined so that the
current best path can be selected. The working of AO* algorithm is illustrated in figure as
follows:
Fig: working of AO* algorithm

 Referring the figure, the initial node is expanded and D is marked initially as promising node.
D is expanded producing an AND arc E-F. f ' value of D is updated to 10. Going backwards
we can see that the AND arc B-C is better. It is now marked as current best path. B and C have
to be expanded next. This process continues until a solution is found or all paths have led to
dead ends, indicating that there is no solution. An A* algorithm the path from one node to the
other is always that of the lowest cost and it is independent of the paths through other nodes.
 The algorithm for performing a heuristic search of an AND - OR graph is given below. Unlike
 A* algorithm which used two lists OPEN and CLOSED, the AO* algorithm uses a single
structure G. G represents the part of the search graph generated so far. Each node in G points
down to its immediate successors and up to its immediate predecessors, and also has with it
the value of h' cost of a path from itself to a set of solution nodes.
 The cost of getting from the start nodes to the current node "g" is not stored as in the A*
algorithm. This is because it is not possible to compute a single such value since there may be
many paths to the same state. In AO* algorithm serves as the estimate of goodness of a node.
Also a there should value called FUTILITY is used. The estimated cost of a solution is
greater than FUTILITY then the search is abandoned as too expensive to be practical.
 For representing above graphs AO* algorithm is as follows.
AO* Algorithm:
 Let G consists only to the node representing the initial state call this node INTT.
Compute h' (INIT).
 Until INIT is labelled SOLVED or hi (INIT) becomes greater than FUTILITY,
repeat the following procedure.
(I) Trace the marked arcs from INIT and select an unbounded node NODE.
(II) Generate the successors of NODE. if there are no successors then assign
FUTILITY as h' (NODE). This means that NODE is not solvable. If there are
successors then for each one called SUCCESSOR, that is not also an ancestor of
NODE do the following
(a) Add SUCCESSOR to graph G
(b) If successor is not a terminal node, mark it solved and assign zero to its h ' value.
(c) If successor is not a terminal node, compute it h' value.
(III) Propagate the newly discovered information up the graph by doing the following.
Let S be a set of nodes that have been marked SOLVED. Initialize S to NODE.
Until S is empty repeat the following procedure;
(a) Select a node from S call if CURRENT and remove it from S.
(b) Compute h' of each of the arcs emerging from CURRENT, Assign minimum
h' to CURRENT.
(c) Mark the minimum cost path a s the best out of CURRENT.
(d) Mark CURRENT SOLVED if all of the nodes connected to it through the
new marked are have been labelled SOLVED.
(e) If CURRENT has been marked SOLVED or its h ' has just changed, its new
status must be propagate backwards up the graph. Hence all the ancestors of
CURRENT are added to S.
AO* Search Procedure
1. Place the start node on open.

2. Using the search tree, compute the most promising solution tree TP.
3. Select node n that is both on open and a part of tp, remove n from open and place it no
closed.
4. If n is a goal node, label n as solved. If the start node is solved, exit with success where tp is
the solution tree; remove all nodes from open with a solved ancestor.
5. If n is not solvable node, label n as unsolvable. If the start node is labeled as unsolvable, exit
with failure. Remove all nodes from open, with unsolvable ancestors.
6. Otherwise, expand node n generating all of its successor compute the cost of for
each newly generated node and place all such nodes on open.
7. Go back to step (2)
8. Note: AO* will always find minimum cost solution.
Constraint Satisfaction
 The general problem is to find a solution that satisfies a set of constraints. The heuristics
which are used to decide what node to expand next and not to estimate the distance to the
goal.
 Examples of this technique are design problem, labeling graphs robot path planning and crypt
arithmetic puzzles.
 In constraint satisfaction problems a set of constraints are available. This is the search space.
Initial State is the set of constraints given originally in the problem description. A goal state
is any state that has been constrained enough.
Constraint satisfaction is a two-step process.
1. First constraints are discovered and propagated throughout the system.
2. Then if there is not a solution search begins, a guess is made and added to this constraint.
Propagation then occurs with this new constraint.
Algorithm
1. Propagate available constraints:
 Open all objects that must be assigned values in a complete solution.
 Repeat until inconsistency or all objects are assigned valid values:
Select an object and strengthen as much as possible the set of constraints that apply to object.
 If set of constraints different from previous set then open all objects that share any of these
constraints. Remove selected object.
 If union of constraints discovered above defines a solution return solution.
 If union of constraints discovered above defines a contradiction return failure.
 Make a guess in order to proceed.
Repeat until a solution is found or all possible solutions exhausted:
 Select an object with a no assigned value and try to strengthen its constraints.
 Recursively invoke constraint satisfaction with the current set of constraints plus the selected
strengthening constraint.
 Crypt arithmetic puzzles are examples of constraint satisfaction problems in which the goal to
discover some problem state that satisfies a given set of constraints. Some problems of crypt
arithmetic are show below
 Here each decimal digit is to be assigned to each of the letters in such a way that the answer
to the problem is correct. If the same letter occurs more than once it must be assigned the
same digit each time. No two different letters may be assigned the same digit.
The puzzle SEND + MORE = MONEY, after solving, will appear like this
State production and heuristics for crypt arithmetic problem
Ans.
The heuristics and production rules are specific to the following example:
Heuristics Rules
1. If sum of two ‗n„ digit operands yields ‗n+1„ digit result then the ‗n+1„th digit has to
be one.
2. Sum of two digits may or may not generate carry.
3. Whatever might be the operands the carry can be either 0 or 1.
4. No two distinct alphabets can have same numeric code.
5. Whenever more than 1 solution appears to be existing, the choice is governed by the
fact that no two alphabets can have same number code.
Means – end Analysis
Means-ends analysis allows both backward and forward searching. This means we could solve major
parts of a problem first and then return to smaller problems when assembling the final solution.
The means-ends analysis algorithm can be said as follows:
1. until the goal is reached or no more procedures are available:
 Describe the current state the goal state and the differences between the two.
 Use the difference that describes a procedure that will hopefully get nearer to goal.
 Use the procedure and update current state
If goal is reached then success otherwise fail.
 For using means-ends analysis to solve a given problem, a mixture of the two directions,
forward and backward, is appropriate. Such a mixed strategy solves the major parts of a
problem first and then goes back and solves the small problems that arise by putting the big
pieces together.
 The means-end analysis process detects the differences between the current state and goal
state. Once such difference is isolated an operator that can reduce the difference has to be
found.
 The operator may or may not be applied to the current state. So a sub problem is set up of
getting to a state in which this operator can be applied.
 In operator sub goaling backward chaining and forward chaining is used in which first the
operators are selected and then sub goals are set up to establish the preconditions of the
operators.
 If the operator does not produce the goal state we want, then we have second sub problem of
getting from the state it does produce to the goal. The two sub problems could be easier to
solve than the original problem, if the difference was chosen correctly and if the operator
applied is really effective at reducing the difference. The means-end analysis process can
then be applied recursively.
 This method depends on a set of rules that can transform one problem state into another.
These rules are usually not represented with complete state descriptions on each side.
 Instead they are represented as a left side that describes the conditions that must be met for
the rules to be applicable and a right side that describes those aspects of the problem state
that will be changed by the application of the rule. A separate data structure called a
difference table which uses the rules by the differences that they can be used to reduce.
Means-Ends Analysis (MEA)
 We have presented collection of strategies that can reason either forward or backward, but
for a given problem, one direction or the other must be chosen.
 A mixture of the two directions is appropriate. Such a mixed strategy would make it possible
to solve the major parts of a problem first and then go back and solve the small problems
that arise in “gluing” the big pieces together.
 The technique of Means-Ends Analysis allows us to do that.
Algorithm: Means-Ends Analysis
1. Compare CURRENT to GOAL. If there is no difference between them then return.
2. Otherwise, select the most important difference and reduce it by doing the following until
success of failure is signalled:
a) Select an as yet untried operator O that is applicable to the current difference. If there are no
such operators, then signal failure.
b) Attempt to apply O to CURRENT. Generate descriptions of two states: O-START, a state in
which O‟s preconditions are satisfied and O-RESULT, the state that would result if O were
applied in O-START.
c) If (FIRST-PART <- MEA( CURRENT, O-START)) and (LAST-PART <- MEA(O-
RESULT, GOAL)) are successful, then signal success and return the result of concatenating
FIRST-PART, O, and LAST-PART.
MEA: Operator Sub goaling
 MEA process centersaround the detection of differences between the current state and the
goal state.
 Once such a difference is isolated, an operator that can reduce the difference must be found.
 If the operator cannot be applied to the current state, we set up a sub problem of getting to a
state in which it can be applied.
 The kind of backward chaining in which operators are selected and then sub goals are set up
to establish the preconditions of the operators.
MEA : Household Robot Preconditions Results
Application Operator
PUSH(Obj, Loc) At(robot, obj)^ Large(obj)^ At(obj, loc)^ At(robot,
Clear(obj)^ armempty loc)
CARRY(Obj, loc) At(robot, obj)^ Small(obj) At(obj, loc)^At(robot, loc)
WALK(loc) None At(robot, loc)

PICKUP(Obj) At(robot, obj) Holding(obj)
PUTDOWN(obj) Holding(obj) ¬holding(obj)
PLACE(Obj1, obj2) At(robot, obj2)^ Holding(obj1) On(obj1, obj2)
Fig: A Difference Table
The Progress of Mean Ends Analysis Method
Fig: More Progress of the Means-Ends Method

UNIT – II KNOWLEDGE REPRESENTATION AND INFERENCE
GAMEPLAYING
Introduction
 Game Playing is one of the oldest sub-fields in AI. Game playing involves abstract and pure
form of competition that seems to require intelligence. It is easy to represent the states and
actions. To implement the game playing very little world knowledge is required.
 The most common used AI technique in game is search. Game playing research has
contributed ideas on how to make the best use of time to reach good decisions.
Game playing is a search problem defined by:
 Initial state of the game
 Operators defining legal moves
 Successor function
 Terminal test defining end of game states
 Goal test
 Path cost/utility/payoff function
More popular games are too complex to solve, requiring the program to take its best guess. “For
example in chess, the search tree has 1040 nodes (with branching factor of 35). It is the
opponent because of whom uncertainty arises.
Characteristics of game playing
1. There are always “unpredictable” opponents:
 The opponent introduces uncertainty
 The opponent also wants to win
The solution for this problem is a strategy, which specifies a move for every possible opponent
reply.
2. Time limits:
Game are often played under strict time constraints (eg: chess) and therefore must be very
effectively handled.
 There are special games where two players have exactly opposite goals. There are also
perfect information games (such as chess and go) where both the players have access to
the same information about the game in progress (e.g. tic-tac-toe).
 In imperfect game, information games (such as bridge or certain card games and games
where dice is used). Given sufficient time and space, usually an optimum solution can be
obtained for the former by exhaustive search, though not for the latter.
Types of games
There are basically two types of games
 Deterministic games
 Chance games
Game like chess and checker are perfect information deterministic games whereas games like
scrabble and bridge are imperfect information. We will consider only two player discrete,
perfect information games, such as tic-tac-toe, chess, checkers etc... .
Two- player games are easier to imagine and think and more common to play.
Minimize search procedure
Typical characteristic of the games is to look ahead at future position in order to succeed. There is a
natural correspondence between such games and state space problems.
In a game like tic-tac-toe
 States-legal board positions
 Operators-legal moves
 Goal-winning position
The game starts from a specified initial state and ends in position that can be declared win for one
player and loss for other or possibly a draw. Game tree is an explicit representation of all possible
plays of the game. We start with a 3 by 3 grid.
Then the two players take it in turns to place a there marker on the board ( one player uses the „X‟
marker, the other uses the „O‟ marker). The winner is the player who gets 3 of these markers in a
row, eg..if X wins
Another possibility is that no1 wins eg..
Or the third possibility is a draw case
Search tree for tic-tac-toe

 The root node is an initial position of the game. Its successors are the positions that the first
player can reach in one move; their successors are the positions resulting from the second
player's replies and soon.
 Terminal or leaf nodes are presented by WIN, LOSS or DRAW. Each path from the root or a
terminal node represents a different complete play of the game. The moves available to one
player from a given position can be represented by OR links whereas the moves available to
his opponent are AND links.
The trees representing games contain two types of nodes:
 MAX- nodes (assume at even level from root)

 MIN - nodes [assume at odd level from root)
Fig: 2.1 Representation of MIN –MAX Nodes

Search tree for tic-tac-toe
 The leaves nodes are labeled WIN, LOSS or DRAW depending on whether they represent a
win, loss or draw position from Max`s viewpoint. Once the leaf nodes are assigned their
WIN-LOSS or DRAW status, each nodes in the game tree can be labeled WIN, LOSS or DRAW
by a bottom up process.
 Game playing is a special type of search, where the intention of all players must be taken
into account.
Min-Max procedure
 Starting from the leaves of the tree (with final scores with respect to one player, MAX), and
go backwards towards the root.
 At each step, one player (MAX) takes the action that leads to the highest score, while the
other player (MIN) takes the action that leads to the lowest score.
 All the nodes in the tree will be scored and the path from root to the actual result is the one
on which all node have the same score.
The Min-Max procedure operates on a game tree and is recursive procedure where a player
tries to minimize its opponent’s advantage while at the same time maximizing its own. The
player hoping for positive number is called the maximizing player. His opponent is the
minimizing player.
If the player to move is the maximizing player, he is looking for a path leading to a large positive
number and his opponent will try to force the play toward situation with strongly negative static
evaluations.
In game playing first construct the tree up till the depth-bound and then compute the evaluation
function for the leaves. The next step is to propagate the values up to the starting.
The procedure by which the scoring information passes up the game tree is called the MINIMAX
procedures since the score at each node is either minimum or maximum of the scores at the
nodes immediately below.
One-ply search
In this fig since it is the maximizing search ply 8 is transferred upwards to A
Two-ply search
Static evaluation function
 To play an entire game we need to combine search oriented and non-search oriented
techniques. The idea way to use a search procedure to find a solution to the problem
statement is to generate moves through the problem space until a goal state is reached.
 Unfortunately for games like chess even with a good plausible move generator, it is not
possible to search until goal state is reached. In the amount of time available it is possible to
generate the tree at the most 10 to 20 ply deep.
 Then in order to choose the best move, the resulting board positions must be compared to
discover which is most advantageous. This is done using the static evaluation function.
 The static evaluation function evaluates individual board positions by estimating how much
likely they are eventually to lead to a win.
The min-max procedure is a depth-first, depth limited search procedure.
 If the limit of search has reached, compute the static value of the current position relative to
the appropriate layer as given below (maximizing or minimizing player). Report the result
(value and path).
 If the level is minimizing level(minimizer’s turn)
 Generate the successors of the current position. Apply MINIMAX to each of the successors.
Return the minimum of the result.
 If the level is a maximizing level. Generate the successors of current position Apply
MINIMAX to each of these successors. Return the maximum of the result. The maximum
algorithm uses the following procedures
1. MOVEGEN(POS)
It is plausible move generator. It returns a list of successors of „Pos‟.
2. STSTIC (Pos,Depth)
The static evaluation function that returns a number representing the goodness of “pos” from
the current point of view.
3. DEEP-ENOUGH
It returns true if the search to be stopped at the current level else it returns false.
A MINIMAX example
Fig:Example of Min-Max search procedure
Fig : Simulated Min-Max Search

In the above example, a Min-Max search in a game tree is simulated. Every leaf has a corresponding
value, which is approximated from player A‟s view point. When a path is chosen, the value of the
child will be passed back to the parent.
For example, the value for D is 6, which is the maximum value of its children, while the value for
C is 4 which is the minimum value of F and G. In this example the best sequence of moves found by
the maximizing/minimizing procedure is the path through nodes A, B, D and H, which is called the
principal continuation. The nodes on the path are denoted as PC (principal continuation) nodes.
For simplicity we can modify the game tree values slightly and use only maximization
operations. The trick is to maximize the scores by negating the returned values from the children
instead of searching for minimum scores and estimate the values at leaves from the player‟s own
viewpoint
Alpha-beta cut-offs
 The basic idea of alpha-beta cutoffs is “It is possible to compute the correct min-max decision
without looking at every node in the search tree”. This is called pruning (allow us to ignore
portions of the search tree that make no difference to the final choice).
The general principle of alpha-beta pruning is
 Consider a node n somewhere in the tree, such that a player has a chance to move to this node.
 If player has a better chance m either at the parent node of n ( or at any choice point further up)
then n will never be reached in actual play.
 When we are doing a search with alpha-beta cut-offs, if a node’s value is too high, the minimizer
will make sure it’s never reached (by turning off the path to get a lower value). Conversely, if a
node’s value is too low, the maximize will make sure it’s never reached.
This gives us the following definitions
o Alpha: the highest value that the maximizer can guarantee himself by making some
move at the current node OR at some node earlier on the path to this node.
o Beta: the lowest value that the minimizer can guarantee by making some move at the
current node OR at some node earlier on the path to this node.
 The maximizer is constantly trying to push the alpha value up by finding better moves; the
minimizer is trying to push the beta value down. If a node’s value is between alpha and beta,
then the players might reach it.
 At the beginning, at the root of the tree, we don’t have any guarantees yet about what values
the maximizer and minimizer can achieve. So we set beta to ∞ and alpha to -∞. Then as we
move down the tree, each node starts with beta and alpha values passed down from its parent.
Consider a situation in which the MIN – children of a MAX-node have been partially inspected.
Alpha-beta for a max node
At this point the “tentative” value which is backed up so far of F is 8. MAX is not interested in any
move which has a value of less than 8, since it is already known that 8 are the worst that MAX can
do, so far. Thus the node D and all its descendent can be pruned or excluded from further
exploration, since MIN will certainly go for a value of 3 rather than8.
Fig: Partial Inspections of MIN Children
Fig: Alpha-beta for a max node Alpha-beta for a min node
 MIN is trying to minimize the game-value. So far, the value 2 is the best available form
MIN‟s point of view. MIN will immediately reject node D, which can be stopped for further
exploration.
 In a game tree, each node represents a board position where one of the players gets to
choose a move. For example, in the fig below look at the node C. As soon as we look at its
left child, we realize that if the players reach node C, the minimizer can limit the utility
maximize can get utility 6 by going to node B instead, so he would never let the game reach
C. therefore we don`t even have to look at C‟s other children.
 Initially at the root of the tree, there is no guarantee about what values the maximizer and
minimizer can achieve. So beta is set to ∞ and alpha to -∞. Then as we move down the tree,
each node starts with beta and alpha values passed down from its parent.
 It’s a maximize node, and then alpha is increased if a child value is greater than the current
alpha value. Similarly, at a minimizer node, beta may be decreased. This is shown in the fig.
Fig: Tree with alpha-beta cut-offs
 At each node, the alpha and beta values may be updated as we iterate over the node’s children.
At node E, when alpha is updated to a value of 8, it ends up exceeding beta. This is a point
where alpha beta pruning is required we know the minimizer would never let the game reach
this node so we don’t have to look at its remaining children. In fact, pruning happens exactly
when the alpha and beta lines hit each other in the node value.
Fig: Algorithm for Alpha Beta Cut-Off

If the highlighted characters are removed, what is left is a min-max function. The function is passed the
depth it should search, and –INIFINITY as alpha and +INIFINITY as beta.
This does a five-ply search
The Horizon effect
 A potential problem in game tree search to a fixed depth is the horizon effect, which occurs
when there is a drastic change in value immediately beyond the place where the algorithm stops
searching. Consider the tree shown in the below fig.A. it has nodes A, B, C and D. at this level
since it is a maximizing ply, the value which will passed up at A is 5.
 Suppose node B is examined one more level as shown in fig B. then we see because of a
minimizing ply value at B is -2 and hence the value passed to A is 2. This results in a drastic
change in the situation. There are two proposed solutions to this problem, neither very
satisfactory.
Secondary search
One proposed solution is to examine the search beyond the apparently best one to see if
something is looming just over the horizon. In that case we can revert to the second-best move.
Obviously then the second-best move has the same problem and there is not time to search beyond
all possible acceptable moves.
Waiting for Quiescence
 If a position looks “dynamic”, don`t even bother to evaluate it.
 Instead, do a small secondary search until things calm down.
 E.g. after capturing a piece, things look good, but this would be misleading if opponent was
about to capture right back.
 In general such factors are called continuation heuristics
Fig: The Beginning of a search Fig: The Beginning of an exchange

Fig shows the further exploration of the tree in fig B. Here now since the tree is further explored,
the value, which is passed to A is 6. Thus the situation calms down. This is called as waiting for
quiescence. This helps in avoiding the horizon effect of a drastic change of values.
Iterative Deepening
 Rather than searching to a fixed depth in the game tree, first search only single ply, then apply
MINMAX to 2 ply, further 3 ply till the final goal state is searched. This is called as iterative
deepening.
 Besides providing good control of time, iterative deepening is usually more efficient than an
equivalent direct search. The reason is that the results of previous iterations can improve the
move ordering of new iteration, which is critical for efficient searching.
Fig: The Situation Calms Down
KNOWLEDGEREPRESENTATION
Representation and Mapping
 Problem solving requires large amount of knowledge and some mechanism for manipulating
thatknowledge.
 The Knowledge and the representation are distinct, play a central but distinguishable roles
in intelligentsystem
 Knowledgeisadescriptionoftheworld;itdeterminesasystem'scompetence
by what it knows.
 Representation is the way knowledge is encoded; it defines the
system'sperformance in doing something.
 Facts Truths about the real world and what we represent. This can be regarded as
theknowledgelevel
In simple words, we:
Need to know about thingswe want to represent, and Need
some means by which things we can manipulate.
Fig: Representations and Mapping

Thus, knowledge representation can be considered at two levels:
 knowledge level, at which facts are described,and
 Symbol level, at which the representations of the objects, defined in terms of symbols, can be
manipulated in theprograms.
Note: A good representation enables fast and accurate access to knowledge and
understanding of the content.
Mapping between Facts and Representation
 Knowledge is a collection of “facts” from somedomain.
 We need a representation of "facts" that can be manipulated by a program. Normal English is
insufficient, too hard currently for a computer program to draw inferences in naturallanguages.
 Thus some symbolic representation isnecessary.
 Therefore, we must be able to map "facts to symbols" and "symbols to facts" using forward and
backward representationmapping.
Using deductive mechanism we can generate a new representation of object:
Hastail (Spot) A new object representation
Spot has a tail [it is new knowledge] Using backward mapping function to generate
English sentence
Good representation can make a reasoning program trivial
Example: Consider an English sentence
Fig: Mapping between Facts and Representation
 The Mutilated Checkerboard Problem: “Consider a normal checker board from which two
squares, in opposite corners, have been removed. The task is to cover all the remaining
squares exactly with dominoes, each of which covers two squares. No overlapping, either of
dominoes on top of each other or of dominoes over the boundary of the mutilated board is
allowed. Can this task bedone?”
(a) (b) (c)
Fig: Three representation of a Mutilated Checker Board
 The dotted line on top indicates the abstract reasoning process that a program is
intended to model.
 The solid lines on bottom indicate the concrete reasoning process that the program
performs.
Forward and Backward Representation
The forward and backward representations are elaborated below
Fig: Elaborated Representation
KR System Requirements
 A good knowledge representation enables fast and accurate access to knowledge and
understanding of the content.
 A knowledge representation system should have following properties.
 Representational Adequacy the ability to represent all kinds of knowledge that are
needed in that domain.
 Inferential Adequacy the ability to manipulate the representational structures to derive
new structure corresponding to new knowledge inferred from old.
 Inferential Efficiency the ability to incorporate additional information into the
knowledge structure that can be used to focus attention of the inference mechanisms in
the most promising direction.
 Acquisitional Efficiency the ability to acquire new knowledge using automatic methods
whenever possible rather than reliance on human intervention
Note: To date no single system can optimizes all of the above properties.
2.2 KNOWLEDGE REPRESENTATIONSCHEMES
There are four types of Knowledge representation:
Relational, Inheritable, Inferential, and Declarative/Procedural.
Relational Knowledge:
 Provides a framework to compare two objects based on equivalent attributes.
 Any instance in which two different objects are compared is a relational type of knowledge.
Inheritable Knowledge
 Obtained from associated objects.
 It prescribes a structure in which new objects are created which may inherit all or a subset
of attributes from existing objects.
Inferential Knowledge
 Is inferred from objects through relations among objects.
 e.g., a word alone is a simple syntax, but with the help of other words in phrase the reader
may infer more from a word; this inference within linguistic is called semantics.
Declarative Knowledge
 Statement in which knowledge is specified, but the use to which that knowledge is to be put
is not given.
 e.g. laws, people's name; these are facts which can stand alone, not dependent on other
knowledge;
Procedural Knowledge
 A representation in which the control information, to use the knowledge, is embedded in
the knowledge itself.
 e.g. Computer programs, directions, and recipes; these indicate specific use or
implementation;
Relational Knowledge
This knowledge associates elements of one domain with another domain.
 Relational knowledge is made up of objects consisting of attributes and their corresponding
associated values.
 The results of this knowledge type are a mapping of elements among different domains.
 The facts about a set of objects are put systematically in columns.
 This representation provides little opportunity for inference.
 Given the facts it is not possible to answer simple question such as
“Who is the heaviest player? ".
 If a procedure for finding heaviest player is provided, then these facts will enable that
procedure to compute an answer.
 We can ask things like who "bats – left" and "throws –right".
Player Height Weight Bats - Throw
Aaron 6-0 180 Right - Right
Mays 5-10 170 Right - Right
Ruth 6-2 215 Left - Left
Williams 6-3 205 Left - Right
Table: Simple Relational Knowledge
Inheritable Knowledge
 Here the knowledge elements inherit attributes from their parents.
 The knowledge is embodied in the design hierarchies found in the functional, physical and
process domains. Within the hierarchy, elements inherit attributes from their parents, but in
many cases not all attributes of the parent elements be prescribed to the child elements.
 The inheritance is a powerful form of inference, but not adequate. The basic KR needs to
be augmented with inference mechanism.
 The KR in hierarchical structure, shown below, is called “semantic network” or a
collection of “frames” or “slot-and-filler structure". The structure shows property
inheritance and way for insertion of additional knowledge.
 Property inheritance: The objects or elements of specific classes inherit attributes 
and values from more general classes. The classes are organized in a generalized
hierarchy.
Fig: Inheritable Knowledge
 The directed arrow represent attributes (is a, instance, team) originates at

object being described and terminates at object or its value.
 The box nodes represent objects and values of the attributes.
 Viewing a node as a frame
Example: Baseball-player
Isa: Adult-Male
Bates: EQUAL handed
Height: 6.1
Batting-average: 0.252
Fig: Viewing a Node as a Frame
Algorithm: Property Inheritance
Retrieve a value V for an attribute A of an instance object O
Steps to follow:
 Find object O in the knowledgebase.
 If there is a value for the attribute A then report that value.
 Else, if there is a value for the attribute instance; If not, then fail.
 Else, move to the node corresponding to that value and look for a value for the attribute
A; If one is found, report it.
 Else, do until there is no value for the “isa” attribute or until an answer is found:
(a) Get the value of the “isa” attribute and move to that node.
(b) See if there is a value for the attribute A; If yes, report it.
 This algorithm is simple. It describes the basic mechanism of inheritance. It does not say what to
do if there is more than one value of the instance or “isa” attribute.
 This can be applied to the example of knowledge base illustrated, in the previous slide, to derive
answers to the following queries:
 team (Pee-Wee-Reese) =Brooklyn–Dodger
 batting–average(Three-Finger-Brown) =0.106
 height (Pee-Wee-Reese) =6.1
 bats (Three Finger Brown) =right
Inferential Knowledge

This knowledge generates new information from the given information.

This new information does not require further data gathering form source, but does require
analysis of the given information to generate new knowledge.
Example:
 Given a set of relations and values, one may infer other values or relations.
 a predicate logic (a mathematical deduction) is used to infer from a set of attributes.
 Inference through predicate logic uses a set of logical operations to relate individual data.
 the symbols used for the logic operations are:

Fig: Inferential Knowledge

From these three statements we can infer that:
“Wonder lives either on land or on water."
Note: If more information is made available about these objects and their relations, then more knowledge
can be inferred.
Declarative/Procedural Knowledge
Difference between Declarative/Procedural knowledge is not very clear.
Declarative knowledge:
Here, the knowledge is based on declarative facts about axioms and domains.

Axioms are assumed to be true unless a counter example is found to invalidate them.

Domains represent the physical world and the perceived functionality.

Axioms and domains thus simply exists and serve as declarative statements that can stand
alone
Procedural knowledge:
 Here, the knowledge is a mapping process between domains that specify “what to do when”
and the representation is of “how to make it” rather than “what it is”. May have inferential
efficiency, but no inferential adequacy and Acquisitional efficiency
 Example: A parser in a natural language has the knowledge that a noun phrase may contain
articles, adjectives and nouns. It thus accordingly call routines that know how to process
articles, adjectives and nouns.
Issues in Knowledge Representation
 The fundamental goal of Knowledge Representations to facilitate inference
(conclusions) from knowledge.
 The issues that arise while using KR techniques are many.
Some of these are explained below.

Important Attributes:
Any attribute of objects so basic that they occur in almost every problem
domain?

Relationship among attributes:
Is any important relationship that exists among object attributes?

Choosing Granularity:
At what level of detail should the knowledge be represented?


Set of objects :
How sets of objects be represented?

Finding Right structure:
Given a large amount of knowledge stored, how can relevant parts are accessed?
Important Attributes
 There are attributes that are of general significance.
 There are two attributes "instance" and "isa” that are of general importance. These
attributes are important because they support property inheritance.
Relationship among Attributes
 The attributes to describe objects are themselves entities they represent.
 The relationship between the attributes of an object, independent of specific knowledge
they encode, may hold properties like:
 Inverses,existenceinanisahierarchy,techniques for reasoning about values and
single valued attributes.
Inverses:
This is about consistency check, while a value is added to one attribute. The entities are related
to each other in many different ways. The Fig shows attributes (isa, instance, and team), each
with a directed arrow, originating at the object being described and terminating either at the
object or its value.
There are two ways of realizing this:
 First, represent two relationships in a single representation; e.g., a logical
representation, team (Pee-Wee-Reese, Brooklyn–Dodgers), that can be interpreted
as a statement about Pee-Wee-Reese or Brooklyn–Dodger.
 Second, use attributes that focus on a single entity but use them in pairs, one the
inverse of the other; for e.g., one, team = Brooklyn– Dodgers, and the other, team =
Pee-Wee-Reese . ..
This second approach is followed in semantic net and frame-based systems, accompanied by a
knowledge acquisition tool that guarantees the consistency of inverse slot by checking, each
time a value is added to one attribute then the corresponding value is added to the inverse.
Existence in an "isa" hierarchy
 This is about generalization-specialization, like, classes of objects and specialized subsets of
those classes. There are attributes and specialization of attributes.
 Example: the attribute "height" is a specialization of general attribute "physical-size" which
is, in turn, a specialization of "physical-attribute". These generalization- specialization
relationships for attributes are important because they support inheritance.
Techniques for reasoning about values
 This is about reasoning values of attributes not given explicitly. Several kinds of information
are used in reasoning ,like,
Height: must be in a unit of length,
Age:: of person cannot be greater than the age of person's parents.
 The values are often specified when a knowledge base is created.
Single valued attributes
 This is about a specific attribute that is guaranteed to take a unique value.
 Example: A baseball player can at time have only a single height and be a member of only
one team. KR systems take different approaches to provide support for single valued
attributes.
Choosing Granularity
What level should the knowledge be represented and what are the primitives?
 Should there be a small number or should there be a large number of low-level primitives or
High-level facts.
High-level facts may not be adequate for inference while Low-level primitives may require a lot
of storage.
Example of Granularity:
 Suppose we are interested in following acts
John spotted Sue.
 This could be represented as

Spotted (agent(John),
object Sue))
 Such a representation would make it easy to answer questions such are

who spotted Sue?
 Suppose we want to know:

Did John see Sue?
 Given only one fact, we cannot discover that answer.

 We can add other facts, such as
Spotted (x , y) -> saw (x , y)
 We can now infer the answer to the question.

 Set of Objects 
 Certain properties of objects that are true as member of a set but not as individual;
 Example: Consider the assertion made in the sentences "there are more sheep than
people in Australia”, and "English speakers can be found all over the world."
 Todescribethesefacts,theonlywayistoattachassertiontothesetsrepresenting 
people, sheep, and English.
 The reason to represent sets of objects is: 
 If a property is true for all or most elements of a set, then it is more efficient to
associate it once with the set rather than to associate it explicitly with every elements of
the set.
 This is done in different ways:
 in logical representation through the use of universal quantifier, and
 In hierarchical structure where node represents sets, the inheritance propagates set
level assertion down to individual.
Example: assert large (elephant); Remember to make clear distinction between,

whether we are asserting some property of the set itself, means, the set of elephants is
large, or

Asserting some property that holds for individual elements of the set, means, anything that
is an elephant is large.
There are three ways in which sets may be represented:


Name, as in the example. Inheritable KR, the node - Baseball- Player and the predicates
as Ball and Batter in logical representation.

Extensional definition is to list the numbers, and

In tensional definition is to provide a rule, that returns true or false depending on
whether the object is in the set or not.
 Finding Right Structure
o
Access to right structure for describing a particular situation.
o
It requires, selecting an initial structure and then revising the choice. While doing so, it is
necessary to solve following problems:
 How to perform an initial selection of the most appropriate structure.
 How to fill in appropriate details from the current situations.
 How to find a better structure if the one chosen initially turns out not to be appropriate.
 What to do if none of the available structures is appropriate.
 When to create and remember a new structure.
o
There is no good, general purpose method for solving all these problems. Some knowledge
representation techniques solve some of them.
KNOWLEDGE REPRESENTATION USING PREDICATELOGIC

Representing Simple Facts in Logic
o AI system might need to represent knowledge. Propositional logic is one of the fairly good forms
of representing the same because it is simple to deal with and a decision procedure for it exists.
Real-world facts are represented as logical propositions and are written as well-formed formulas
(wff's) in propositional logic, as shown in Figure below.
o Using these propositions, we may easily conclude it is not sunny from the fact that it’s raining.
But contrary to the ease of using the propositional logic there are its limitations. This is well
demonstrated using a few simple sentences like:
Fig: Some simple facts in Propositional logic
Socrates is a man. We could write:

SOCRATESMAN
But if we also wanted to represent Plato is a man.

We would have to write something such as:
PLATOMAN
Which would be a totally separate assertion, and we would not be able to draw any
conclusions about similarities between Socrates and Plato. It would be much better to
represent these facts as:
MAN(SOCRATES) MAN(PLATO)
Since now the structure of the representation reflects the structure of the knowledge itself.
But to do that, we need to be able to use predicates applied to arguments. We are in even
more difficulty if we try to represent the equally classic sentence
All men are mortal.
We could represent this as:
MORTALMAN
But that fails to capture the relationship between any individual being a man and that
individual being a mortal. To do that, we really need variables and quantification unless we
are willing to write separate statements about the mortality of every known man.
Let's now explore the use of predicate logic as a way of representing knowledge by looking
at a specific example. Consider the following set of sentences:
1. Marcus was a man.
2. Marcus was a Pompeian.
3. All Pompeians were Romans.
4. Caesar was a ruler.
5. All Romans were either loyal to Caesar or hated him.
6. Everyone is loyal to someone.
7. People only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar.
The facts described by these sentences can be represented as a set of wff's in predicate
logic as follows:
1. Marcus was aman
Man (Marcus)
Although this representation fails to represent the notion of past tense (which is clear in the
English sentence), it captures the critical fact of Marcus being a man. Whether this omission
is acceptable or not depends on the use to which we intend to put the knowledge.
2. Marcus was aPompeian.
Pompeian (Marcus)
3. All Pompeian‟s were Romans.
x: Pompeian(x)Roman(x)
4. Caesar was aruler.
ruler(Caesar)
Since many people share the same name, the fact that proper names are often not
references to unique individuals, overlooked here. Occasionally deciding which of several
people of the same name is being referred to in a particular statement may require a
somewhat more amount of knowledge and logic.
5. All Romans were either loyal to Caesar or hatedhim.
x: Roman(x)loyalto(x, Caesar) V hate(Caesar)
Here we have used the inclusive-or interpretation of the two types of or supported by
English language. Some people will argue, however, that this English sentence is really
stating an exclusive-or. To express that we would have towrite:
x: Roman(x) [(loyalto(x, Caesar) V hate(x,
Caesar))Not (loyalto(x, Caesar) hate(x, Caesar))]
6. Everyone is loyal tosomeone.
x:y : loyalto(x,y)
The scope of quantifiers is a major problem that arises when trying to convert English sentences
into logical statements. Does this sentence say, as we have assumed in writing the logical formula
above, that for each person there exists someone to whom he or she is loyal, possibly a different
someone for everyone? Or does it say that there is someone to whom everyone is loyal?
Fig: An Attempt to Prove not loyal to(Marcus, Caesar)
7. People only try to assassinate rulers they are not loyalto.

x: y : person(x) ruler(y) tryassasinate(x,y)loyalto(x,y)
8. Like the previous one this sentence too is ambiguous which may lead to more than one
conclusion. The usage of “try to assassinate” as a single predicate gives us a fairly simple
representation with which we can reason about trying to assassinate. But there might be
connections as try to assassinate and not actually assassinate could not be made easily.
9. Marcus tried to assassinate Caesar.
tryassasinate
(Marcus,Caesar)
Now, say suppose we wish to answer the following
question:
Was Marcus loyal to Caesar?
What we do is start reasoning backward from the desired goal which is represented in
predicate logic as:
loyalto(Marcus, Caesar)
Fig shows an attempt to produce a proof of the goal by reducing the set of necessary but
as yet unattained goals to the empty sets. The attempts fail as we do not have any
statement to prove person (Marcus). But the problem is solved just by adding an
additional statement i.e.
10. All men are people.
x : man(x) person(x)
Now we can satisfy the last goal and produce a proof that Marcus was not loyal to Caesar.
2.3.2 Representing Instance and is a relationships


Knowledge can be represented as classes, objects, attributes and Super class and sub class
relationships.

Knowledge can be inference using property inheritance. In this elements of specific classes
inherit the attributes and values.

Attribute instance is used to represent the relationship “Class membership ” (element of the
class)

Attribute isa is used to represent the relationship “Class inclusion” (super class, sub class
relationship)
These examples illustrate two points.
 The first is fairly specific. It is that, although class and super class memberships are important
facts that need to be represented those memberships need not be represented with predicates
labelled instance and isa. In fact, in a logical framework it is usually unwieldy to do that, and
instead unary predicates corresponding to the classes are often used.
 The second point is more general. There are usually several different ways of representing a
given fact within a particular representational framework, be it logic or anything else. The choice
depends partly on which deductions need to be supported most efficiently and partly on taste.
 The only important thing is that within a particular knowledge base consistency of
representation is critical. Since any particular inference rule is designed to work on one
particular form of representation, it is necessary that all the knowledge to which that rule is
intended to apply be in the form that the rule demands.
 Many errors in the reasoning performed by knowledge-based programs are the result of
inconsistent representation decisions. The moral is simply to be careful.
Fig: Three ways of representing class membership
2.3.3 Computable functions andpredicates

Some of the computational predicates like Less than, Greater than used in
knowledgerepresentation.

It generally return true or false for theinputs.
Examples: Computable predicates
gt(1,0) or lt(0,1)
gt(5,4) or gt(4,5)
Computable functions: gt(2+4, 5)
Consider the following set of facts, again involving Marcus:
1. marcus was a man man(Marcus)
2. Marcus was a pompeian
Pompeian(Marcus)
3. Marcus was born in 40 A.D
born(marcus,40)
4. All men are mortal
∀x: men(x)→mortal(x)
5. All Pompeians died when the volcano erupted in 79A.D
erupted(volcano,79) & x :pompeian(x)→died(x,79)
6. No mortal lives longer than150 years
∀x:∀t1:∀t2: mortal(x) & born(x,t1) &gt(t2-t1,150)→dead(x,t1)
7. It is Now 1991
Now=1991
8. Alive means notdead

∀x:∀t: [alive(x,t)→~dead(x,t)]& [~dead(x,t)→alive(x,t)]
9. Ifsomeone dies then he is dead at all latertimes
∀x:∀t1:∀t2: died(x,t1) &gt(t2,t1)→dead(x1,t2)
This representation says that one is dead in all years after the one in which one died. It
ignores the question of whether one is dead in the year in which one died.
Set of facts about Marcus
1. man(Marcus)
2. Pompeian(Marcus)
3. born(marcus,40)
4. ∀:pompeian(x)→died(x,79)
5. erupted(volcano,79)
6. ∀x:∀t1:∀t2: mortal(x) & born(x,t1) &gt(t2-t1,150)→
i. dead(x,t1)
7. Now=1991
8.∀x:∀t:[alive(x,t)→~dead(x,t)] &[~dead(x,t)→alive(x,t)]
9. ∀x:∀t1:∀t2: died(x,t1) &gt(t2,t1)→ dead(x1,t2)

Two things should be clear from the proofs we have just shown:
 Even very simple conclusions can require many steps to prove.
 A variety of processes, such as matching, substitution, and application of modus ponens are
involved in the production of a proof, This is true even for the simple statements we are
using, It would be worse if we had implications with more than a single term on the right or
with complicated expressions involving ands and or on the left.
Disadvantage:

Many steps required to prove simple conclusions

Variety of processes such as matching and substitution used to prove simple conclusions

RESOLUTION

Resolution is a proof procedure by refutation.

To prove a statement using resolution it attempt to show that the negation of that statement.
Algorithm: Convert to Clause Form
1. Eliminate →, using the fact that a → b is equivalent to ¬ a V b. Performing this transformation on

the wff given above yields
∀x: ¬ [Roman(x)∧know(x, Marcus)] V
[hate(x, Caesar) V (∀y : ¬(∃z : hate(y, z)) V thinkcrazy(x, y))]
2. Reduce the scope of each ¬ to a single term, using the fact that ¬ (¬ p) = p, deMorgan's
laws[which say that¬(a∧b)=¬aV¬band¬(aVb)=¬a∧¬b],and the standard
correspondences between quantifiers [¬ ∀x: P(x) = ∃x: ¬ P(x) and ¬ ∃x: P(x) = ∀ x:
¬P(x)]. Performing this transformation on the wff from step 1 yields
∀x: [¬ Roman(x) V ¬ know(x, Marcus)] V
[hate(x, Caesar) V (∀y: ∀z: ¬ hate(y, z) V thinkcrazy(x, y))]
Fig: One Way of Proving that Marcus is dead
1. Standardize variables so that each quantifier binds a unique variable. Since variables are just
dummy names, this process cannot affect the truth value of the wff. For example, theformula
∀x: P(x) V∀x: Q(x)
Would be convertedto
∀x: P(x) V∀y: Q(y)
2. Move all quantifiers to the left of the formula without changing their relative order. This is
possible since there is no conflict among variable names. Performing this operation on the
formula of step 2, weget
∀x:∀y: Az: [¬ Roman(x) V ¬ know(x, Marcus)] V
[hate(x, Caesar) V (¬ hale(y, z) V thinkcrazy(x,y))]
At this point, the formula is in what is known as prenex normal form. It consists of a
prefix of quantifiers followed by a matrix, which isquantifier-free.
3. Eliminate existential quantifiers. A formula that contains an existentially quantified
variableassertsthatthereisavaluethatcanbesubstitutedforthevariablethatmakes
the formula true. We can eliminate the quantifier by substituting for the variable a
reference to a function that produces the desired value. Since we do not necessarily
know how to produce the value, we must create a new function name for every such
replacement. We make no assertions about these functions except that they must
exist. So, for example, the formula
∃y : President(y)
can be transformed into the
formula President(S1)
where S1 is a function with no arguments that somehow produces a value that
satisfies President. If existential quantifiers occur within the scope of universal
quantifiers, then the value that satisfies the predicate may depend on the values of
the universally quantified variables. For example, in the formula
∀x:∃y: father-of(y, x)
the value of y that satisfies father-of depends on the particular value of x. Thus we
must generate functions with the same number of arguments as the number of
universal quantifiers in whose scope the expression occurs. So this example would
be transformedinto
∀x: father-of(S2(x), x)
These generated functions are called Skolem functions. Sometimes ones with no
arguments are called Skolem constants.
4. Drop the prefix. At this point, all remaining variables are universally quantified, so the prefix
can just be dropped and any proof procedure we use can simply assume that any variable it
sees is universally quantified. Now the formula produced in step 4 appearsas
[¬ Roman(x) V ¬ know(x, Marcus)] V
[hate(x, Caesar) V (¬ hate(y, z) V thinkcrazy(x, y))]
5. Convert the matrix into a conjunction of disjuncts. In the case or our example, since there are
no and‟s, it is only necessary to exploit the associative property of or [ i.e., (a ∧ b) Vc
= (a V c)∧(b∧c)] and simply remove the parentheses, giving
¬ Roman(x) V ¬ know(x, Marcus) V
hate(x, Caesar) V ¬ hate(y, z) V thinkcrazy(x, y)
However, it is also frequently necessary to exploit the distributive property [i.e. , (a ∧
b) V c = (a V c) ∧ (b V c)]. For example, the formula
(winter∧wearingboots) V (summer ∧wearingsandals)
Becomes, after one application of the
rule [winter V (summer
∧wearingsandals)]
∧[wearingboots V (summer ∧wearingsandals)]
And then, after a second application, required since there are still conjuncts joined by
OR's,
(winter V summer) ∧
(winter V wearingsandals) ∧
(wearingboots V summer) ∧
(wearingboots V wearingsandals)
3. Create a separate clause corresponding to each conjunct. In order for a wff to be true, all
the clauses that are generated from it must be true. If we are going to be working with
several wff‟s, all the clauses generated by each of them can now be combined to represent
the same set of facts as were represented by the originalwff's.
4. Standardize apart the variables in the set of clauses generated in step 8. By this we mean
rename the variables so that no two clauses make reference to the samevariable.
In making this transformation, we rely on the fact that
(∀x: P(x) ∧ Q(x)) = ∀x: P(x) ∧∀x: Q(x)
Thus since each clause is a separate conjunct and since all the variables are universally
quantified, there need be no relationship between the variables of two clauses, even if they were
generated from the same wff.
Performing this final step of standardization is important because during the resolution
procedure it is sometimes necessary to instantiate a universally quantified variable (i.e., substitute
for it a particular value). But, in general, we want to keep clauses in their most general form as long
as possible. So when a variable is instantiated, we want to know the minimum number of
substitutions that must be made to preserve the truth value of the system.
After applying this entire procedure to a set of wff's, we will have a set of clauses, each of which is
a disjunction of literals. These clauses can now be exploited by the resolution procedure to generate
proofs
2.3.4 Resolution in Propositional Logic

In propositional logic, the procedure for producing a proof by resolution of proposition P with
respect to a set of axioms F is the following.
Algorithm: Propositional Resolution
1. Convert all the propositions of F to clause form
2. Negate P and convert the result to clause form. Add it to the set of clauses obtained in step1.
3. Repeat until either a contradiction is found or no progress can be made:
a. Select two clauses. Call these the parent clauses.
b. Resolve them together. The resulting clause, called the resolvent, will be the disjunction of
all of the literals of both of the parent clauses with the following exception: If there are any
pairs of literals L and ¬L such that one of the parent clauses contains L and the other
contains ¬L, then select one such pair and eliminate both L and ¬L from the resolvent.
c. If the resolvent is the empty clause, then a contradiction has been found. If it is not then add
it to the set of clauses available to the procedure.
Suppose we are given the axioms shown in the first column of Table 1 and we want to
prove R.First we convert the axioms to clause which is already in clause form. Then we
begin selecting pairs of clauses to resolve together. Although any pair of clauses can be
resolved, only those pairs that contain complementary literals will produce a resolvent that is
likely to lead to the goal of sequence of resolvent shown in figure. Itbegins by resolving with
the clause ⱷR since that is one of the clauses that must be involved in the contradiction we
are trying tofind.
One way of viewing the resolution process is that it takes a set of clauses that are all
assumed to be true and based on information provided by the others, it generates new
clauses
that represent restrictions on the way each of those original clauses can be made true. A
contradiction occurs when a clause becomes so restricted that there is no way it can be true.
This is indicated by the generation of the empty clause.
Table : Some Facts in Propositional Logic
Fig: Resolution in Propositional Logic
Unification Algorithm
 In propositional logic it is easy to determine that two literals cannot both be true at the same
time. Simply look for L and ~L. In predicate logic, this matching process is more complicated,
since bindings of variables must be considered.
 For example man (john) and man (john) is a contradiction while man (john) and man(Himalayas)
is not. Thus in order to determine contradictions we need a matching procedure that compares
two literals and discovers whether there exist a set of substitutions that makes them identical.
There is a recursive procedure that does this matching. It is called Unification algorithm.
 In Unification algorithm each literal is represented as a list, where first element is the name of a
predicate and the remaining elements are arguments. The argument may be a single element
(atom) or may be another list. For example we can have literals as
( tryassassinate Marcus Caesar)
( tryassassinate Marcus (ruler of Rome))
To unify two literals, first check if there is a first element re same. If so proceed.
Otherwise they cannot be unified. For example the literals
(try assassinate Marcus Caesar)
(hate Marcus Caesar)
It cannot be unified. The unification algorithm recursively matches pairs of elements, one pair at a
time. The matching rules are:
1. Different constants, functions or predicates cannot match, whereas identical one scan.
2. A variable can match another variable, any constant or a function or predicate expression,
subject to the condition that the function or [predicate expression must not contain any
instance of the variable being matched (otherwise it will lead to infinite recursion).
3. The substitution must be consistent. Substituting y for x now and then z for x later is
inconsistent. (a substitution y for x written as y/x)
 The Unification algorithm is listed below as a procedure UNIFY (L1, L2). It returns a list
representing the composition of the substitutions that were performed during the match.
An empty list NIL indicates that a match was found without any substitutions. If the list
contains a single value F, it indicates that the unification procedure failed.
 The empty list, NIL, indicates that a match was found without any substitutions. The list
consisting of the single value FAIL indicates that the unification procedure failed.
Algorithm: Unify (L1, L2)
1. If L1 or L2 are variables or constants, then:
(a) If L1 and L2 are identical, then return NIL.
(b) Else if L1 is a variable, then if L1 occurs in L2 then return {FAIL}, else return(L2/L1).
(c) Else if L2 is a variable, then if L2 occurs in L1 then return {FAIL} , else return(L1/L2).
(d) Else return{FAIL}.
2. If the initial predicate symbols in L1 and L2 are not identical, then return{FAIL}.
3. If LI and L2 have a different number of arguments, then return{FAIL}.
4. Set SUBST to NIL. (At the end of this procedure, SUBST will contain all the substitutions used to
unify L1 andL2.)
5. For i ← 1 to number of arguments in L1:
(a) Call Unify with the i th argument of L1 and the ith argument of L2, putting result in S.
(b) If S contains FAIL then return {FAIL}.
(c) If S is not equal to NIL then:
(i) Apply S to the remainder of both L1 andL2.
(ii) SUBST: = APPEND(S,SUBST).
6. Return SUBST.
Resolution in Predicate Logic Algorithm:

Resolution In Predicate Logic
1. Convert all the statements of F to clause form
2. Negate P and convert the result to clause form. Add it to the set of clauses obtained in step 1.
3. Repeat until either a contradiction is found or no progress can be made or a predetermined
amount of effort has been expended:
a. Select two clauses. Call these the parent clauses.
b. Resolve them together. The resulting clause, called the resolvent, will be the disjunction of
all of the literals of both the parent clauses with appropriate substitutions performed and
with the following exception: If there is one pair of literals T1 and T2 such that one of the
parent clauses contains T1 and the other contains T2 and if T1 and T2 are unifiable, then
neither T1 nor T2 should appear in the resolvent. We call T1 and T2 complementary literals.
Use the substitution produced by the unification to create the resolvent. If there is one pair
of complementary literals, only one such pair should be omitted from the resolvent.
c. If the resolvent is the empty clause, then a contradiction has been found. If it is not then add
it to the set of clauses available to the procedure.
If the choice of clauses to resolve together at each step is made in certain systematic ways,
then the resolution procedure will find a contradiction if one exists. However, it may take a very
long time. There exist strategies for making the choice that can speed up the process considerably as
given below.
Axioms in clause form
9. persecute(x, y)  hate(y,x)
10. hate(x,y)  persecute(y,x)
11. Converting to clause form,
we get
12.  persecute(x5,y2) hate(y2,x5)
13.  hate(x6,y3) persecute(y3,x6)
Procedural v/s Declarative Knowledge

A Declarative representation is one in which knowledge is specified but the use to which
that knowledge is to be put in, is not given.

A Procedural representation is one in which the control information that is necessary to
use the knowledge is considered to be embedded in the knowledge itself.

To use a procedural representation, we need to augment it with an interpreter that follows
the instructions given in the knowledge.
 The difference between the declarative and the procedural views of knowledge lies in where
control information resides.
man(Marcus)
man(Caesar)
person(Cleop
atra)
∀x: man(x)→
person(x)
person(x)?
Now we want to extract from this knowledge base the ans to the
question: Ǝy : person (y)
Marcus, Ceaser and Cleopatra can be the answers
Fig: Unsuccessful attempts at resolution
Fig: A Resolution Proof
 As there is more than one value that satisfies the predicate, but only one value is needed,
the answer depends on the order in which the assertions are examined during the search of
a response.
 If we view the assertions as declarative, then we cannot depict how they will be examined. If
we view them as procedural, then they do.
 Let us view these assertions as a non-deterministic program whose output is simply not
defined, now this means that there is no difference between Procedural & Declarative
Statements. But most of the machines don’t do so, they hold on to whatever method they
have, either sequential or in parallel.
 The focus is on working on the control model.
man(Marcus)
manCeas
er) Vx :
man(x)
person(x)
Person(Cleopatra)
 If we view this as declarative then there is no difference with the previous statement. But
viewed procedurally, and using the control model, we used to got Cleopatra as the answer,
now the answer is marcus.
 The answer can vary by changing the way the interpreter works.
 The distinction between the two forms is often very fuzzy. Rather than trying to prove which
technique is better, what we should do is to figure out what the ways in which rule formalisms
and interpreters can be combined to solve problems.
Logic Programming

Logic programming is a programming language paradigm in which logical assertions are
viewed as programs, e.g.PROLOG

APROLOG program is described as a series of logical assertions, each of which is a Horn Clause.
A Horn Clause is a clause that has at most one positive literal.

Eg p, ¬ p V q etc are also Horn Clauses.

The fact that PROLOG programs are composed only of Horn Clauses and not of arbitrary
logical expressions has two important consequences.

Because of uniform representation a simple & effective interpreter can be written.

The logic of Horn Clause systems is decidable.

Even PROLOG works on backward reasoning.

The program is read top to bottom, left to right and search is performed depth-first with
backtracking.

There are some syntactic difference between the logic and the PROLOG representations
as mentioned

The key difference between the logic & PROLOG representation is that PROLOG
interpreter has a fixed control strategy, so assertions in the PROLOG program define a
particular search path to answer any question. Whereas Logical assertions define set of
answers that they justify, there can be more than one answers, it can be forward or
backward tracking.

Control Strategy for PROLOG states that we begin with a problem statement, which is
viewed as a goal to be proved.

Look for the assertions that can prove the goal.

To decide whether a fact or a rule can be applied to the current problem,
invoke a standard unification procedure.

Reason backward from that goal until a path is found that terminates with assertions in the
program.

Consider paths using a depth-first search strategy and use backtracking.

Propagate to the answer by satisfying the conditions.
Forward v/s Backward Reasoning

The objective of any search is to find a path through a problem space from the initial to the
final one.

There are 2 directions to go and find the answer
Forwa
rd
Backw
ard
8-square problem

Reason forward from the initial states: Begin building a tree of move sequences that might
be solution by starting with the initial configuration(s) at the root of the tree. Generate the
next level of tree by finding all the rules whose left sides match the root
nodeandusetherightsidestocreatethenewconfigurations.Generateeachnode by taking each
node generated at the previous level and applying to it all of the rules whose left sides
match it. Continue.

Reason backward from the goal states: Begin building a tree of move sequences that might
be solution by starting with the goal configuration(s) at the root of the tree. Generate the
next level of tree by finding all the rules whose right sides match the root node and use the
left sides to create the new configurations. Generate each node by taking each node
generated at the previous level and applying to it all of the rules whose right sides match it.
Continue. This is also called Goal-Directed Reasoning.

To summarize, to reason forward, the left sides (pre-conditions) are matched against the
current state and the right sides (the results) are used to generate new nodes until the goal is
reached.

To reason backwards, the right sides are matched against the current node and the
left sides are used to generate new nodes.
Factors that influence whether to choose forward or backward reasoning:


Are there more possible start states or goal states? We would like to go from smaller set of
states to larger set of states.

In which direction is the branching factor(the average number of nodes that can be reached
directly from a single node) greater? We would like to proceed in the direction with the
lower branching factor.

Will the program be asked to justify its reasoning process to the user? It so,it is important to
proceed in the direction that corresponds more closely with the way user will think.

What kind of event is going to trigger a problem-solving episode? If it is the arrival of a new
fact, forward reasoning should be used. If it a query to which response is desired, use
backward reasoning.
STRUCTURED REPRESENTATION OF KNOWLEDGE.

Representing knowledge using logical formalism, like predicate logic, has several advantages.
They can be combined with powerful inference mechanisms like resolution, which makes reasoning
with facts easy. But using logical formalism complex structures of the world, objects and their
relationships, events, sequences of events etc. cannot be described easily.
A good system for the representation of structured knowledge in a particular domain should posses the
following four properties:
(i) Representational Adequacy:- The ability to represent all kinds of knowledge that are needed in that
domain.
(ii) Inferential Adequacy :- The ability to manipulate the represented structure and infer new
structures.
(iii) Inferential Efficiency:- The ability to incorporate additional information into the knowledge
structure that will aid the inference mechanisms.
(iv) Acquisitional Efficiency :- The ability to acquire new information easily, either by direct insertion or
by program control.
The techniques that have been developed in AI systems to accomplish these objectives fall under two
categories:
1. Declarative Methods:- In these knowledge is represented as static collection of facts which

are manipulated by general procedures. Here the facts need to be stored only one and they
can be used in any number of ways. Facts can be easily added to declarative systems without
changing the general procedures.
2. Procedural Method:- In these knowledge is represented as procedures. Default reasoning

and probabilistic reasoning are examples of procedural methods. In these, heuristic knowledge
of “How to do things efficiently “can be easily represented.
In practice most of the knowledge representation employ a combination of both. Most of the
knowledge representation structures have been developed to handle programs that handle natural
language input. One of the reasons that knowledge structures are so important is that they provide
a way to represent information about commonly occurring patterns of things. Such descriptions are
some times called schema. One definition of schema is
“Schema refers to an active organization of the past reactions, or of past experience, which
must always be supposed to be operating in any well adapted organic response”.
By using schemas, people as well as programs can exploit the fact that the real world is not
random. There are several types of schemas that have proved useful in AI programs.
They include
(i) Frames:- Used to describe a collection of attributes that a given object possesses (eg:
description of a chair).
(ii) Scripts:- Used to describe common sequence of event (eg:- a restaurant scene).
(iii) Stereotypes: - Used to described characteristics of people.
(iv) Rule models:- Used to describe common features shared among a set of rules in a
production system.
Frames and scripts are used very extensively in a variety of AI programs. Before selecting
any specific knowledge representation structure, the following issues have to be considered.
(i) The basis properties of objects, if any, which are common to every problem domain must be
identified and handled appropriately.
(ii) The entire knowledge should be represented as a good set of primitives.

(iii) Mechanisms must be devised to access relevant parts in a large knowledge base.
FRAME
 A frame is a collection of attributes and associated values that describe some entity in the
world. Frames are general record like structures which consist of a collection of slots and
slot values. The slots may be of any size and type.
 Slots typically have names and values or subfields called facets. Facets may also have names
and any number of values. A frame may have any number of slots; a slot may have any
number of facets, each with any number of values.
 A slot contains information such as attribute value pairs, default values, condition for filling
a slot, pointers to other related frames and procedures that are activated when needed for
different purposes.
 Sometimes a frame describes an entity in some absolute sense, sometimes it represents the
entity from a particular point of view. A single frame taken alone is rarely useful.
 We build frame systems out of collection of frames that are connected to each other by
virtue of the fact that the value of an attribute of one frame may be another frame. Each
frame should start with an open parenthesis and closed with a closed parenthesis.
Fig: Syntax of a frame

Let us consider the below examples.
1) Create a frame of the person Ram who is a doctor. He is of 40. His wife name is Sita. They have
two children Babu and Gita. They live in 100 kps street in the city of Delhi in India. The zip code
is756005.
(Ram
(PROFESSION (VALUE Doctor))
(AGE (VALUE40))
(WIFE (VALUE Sita))
(CHILDREN (VALUE Bubu, Gita))
(ADDRESS (STREET (VALUE 100 kps)) (CITY(VALUE Delhi)) (COUNTRY(VALUE
India)) (ZIP (VALUE756005))))
2) CreateaframeofthepersonAnandwhoisachemistryprofessorinRDWomen‟sCollege. His wife
name is Sangita having two children Rupa andShipa.
(Anand
(PROFESSION (VALUE Chemistry
Professor)) (ADDRESS (VALUE RD
Women‟s College))
(WIFE (VALUE
Sangita))
(CHILDREN(VALUE
RupaShipa)))
3) Create a frame of the person Akash who has a white maruti car of LX-400 Model. It has 5 doors.
Its weight is 225kg, capacity is 8, and mileage is 15 km/lit.
(Akash
(CAR (VALUE Maruti))
(COLOUR (VALUE White))
(MODEL (VALUE LX-400))
(DOOR (VALUE 5))
(WEIGHT (VALUE 225kg))
Script:
 It is a knowledge representation technique. Scripts are frame like structures used to
represent commonly occur ring experiences such as going to restaurant, visiting a doctor. A
script is a structure that describes a stereotyped sequence of events in a particular context.
 A script consists of a set of slots. Associated with each slot may be some information about
what kinds of values it may contain as well as a default value to be used if no other
information is available.
 Scripts are useful because in the real world, there are no patterns to the occurrence of
events. These patterns arise because of clausal relationships between events. The events
described in a script form a giant casual chain. The beginning of the chain is the set of entry
conditions which enable the first events of the script to occur. The end of the chain is the set
of results which may enable later events to occur. The headers of a script can all serve as
indicators that the script should be activated.
 Once a script has been activated, there are a variety of ways in which it can be useful in
interpreting a particular situation. A script has the ability to predict events that has not
explicitly been observed. An important use of scripts is to provide a way of building a single
coherent Interpretation from a collection of observation. Scripts are less general structures
than are frames and so are not suitable for representing all kinds of knowledge. Scripts are
very useful for representing the specific kinds of knowledge for which they were designed.
A script has various components like:
1) Entrycondition:It must be true before the events described in the script can occur.
E.g. in arestaurant script the entry condition must be the customer should be hungry and
the customer hasmoney.
2) Tracks: It specifies particular position of the script e.g. In a supermarket script the tracks
may becloth gallery, cosmetics galleryetc.
3) Result: It must be satisfied or true after the events described in the script have occurred.
e.g. In a restaurant script the result must be true if the customer is pleased.
The customer has less money.
4) Probs:
It describes the inactive or dead participants in the script e.g. In a supermarket script, the
probes may be clothes, sticks, doors, tables, bills etc.
5) Roles: It specifies the various stages of the script. E.g. In a restaurant script the scenes
may beentering, orderingetc.
Now let us look on a movie script description according to the above component.
a) Track : CINEMAHALL
b) Roles : Customer(c), Ticket seller(TS), Ticket Checker(TC), Snacks Sellers(SS)
c) Probes : Ticket, snacks, chair, money, Ticket,chart
d) Entry condition : The customer has
money The customer has interest to watch movie.
6) Scenes:
a. SCENE-1 (Entering into the cinemahall)
C PTRANS C into the cinema hall
C ATTEND eyes towards the ticketcounter
C PTRANS C towards the ticket counters C
ATTEND eyes to the ticketchart
C MBUILD to take which class ticket
C MTRANS TS for ticket
C ATRANS money to TS TS ATRANS ticket to C
b. SCENE-2 (Entering into themain

ticket checkgate)
C PTRANS C into the queue of the gate
C ATRANS ticket to TC
TC ATTEND eyes onto the ticket
TC MBUILD to give permission to C for entering into the hall
TC ATRANS ticket to C
C PTRANS C into the picture hall.
c. SCENE-3 (Entering into the picturehall)
C ATTEND eyes into the chair
TC SPEAK where to sit
C PTRANS C towards the sitting position
C ATTEND eyes onto the screen
d.SCENE-4 (Orderingsnacks)
C MTRANS SS for snacks SS
ATRANS snacks to
C CATRANS money toSS
CINGEST snacks
e. SCENE-5(Exit)
CATTEND eyes onto the screen till the end of picture
CMBUILD when to go out of the hall
C PTRANS C out of the hall
7) Result:
The customer is happy
The customer has less money
KNOWLEDGE INFERENCE
The object of a knowledge representation is to express knowledge in a computer tractable form, so that
it can be used to enable our AI agents to perform well.
A knowledge representation language is defined by two aspects:
1. Syntax The syntax of a language defines which configurations of the components of the language
constitute valid sentences.
2. Semantics The semantics defines which facts in the world the sentences refer to, and hence the
statement about the world that each sentence makes.
This is a very general idea, and not restricted to natural language.
A good knowledge representation system for any particular domain should possess the following
properties:
1. Representational Adequacy – the ability to represent all the different kinds of knowledge that might
be needed in that domain.
2. Inferential Adequacy – the ability to manipulate the representational structures to derive new
structures (corresponding to new knowledge) from existing structures.
3. Inferential Efficiency – the ability to incorporate additional information into the knowledge structure
which can be used to focus the attention of the inference mechanisms in the most promising directions.
4. Acquisitional Efficiency – the ability to acquire new information easily. Ideally the agent should be
able to control its own knowledge acquisition, but direct insertion of information by a „knowledge
engineer‟ would be acceptable.
In practice, the theoretical requirements for good knowledge representations can usually be achieved
by dealing appropriately with a number of practical requirements:
1. The representations need to be complete – so that everything that could possibly need to be
represented, can easily be represented.
2. They must be computable – implementable with standard computing procedures.
3. They should make the important objects and relations explicit and accessible – so that it is easy to see
what is going on, and how the various components interact.
4. They should suppress irrelevant detail – so that rarely used details don‟t introduce necessary
complications, but are still available when needed.
5. They should expose any natural constraints – so that it is easy to express how one object or relation
influences another.
6. They should be transparent – so you can easily understand what is being said.
7. The implementation needs to be concise and fast – so that information can be stored, retrieved and
manipulated rapidly.
A Knowledge representation formalism consists of collections of condition-action rules (Production

Rules or Operators), a database which is modified in accordance with the rules, and a Production System
Interpreter which controls the operation of the rules i.e The 'control mechanism' of a Production
System, determining the order in which Production Rules are fired. A system that uses this form of
knowledge representation is called a production system.
Production Based System
A production system consists of four basic components:
1. A set of rules of the form Ci ® Ai where Ci is the condition part and Ai is the action part. The condition
determines when a given rule is applied, and the action determines what happens when it is applied.
2. One or more knowledge databases that contain whatever information is relevant for the given
problem. Some parts of the database may be permanent, while others may temporary and only exist
during the solution of the current problem. The information in the databases may be structured in any
appropriate manner.
3. A control strategy that determines the order in which the rules are applied to the database, and
provides a way of resolving any conflicts that can arise when several rules match at once.
4. A rule applier which is the computational system that implements the control strategy and applies the
rules.
Four classes of production systems:-
1. A monotonic production system
2. A non monotonic production system
3. A partially commutative production system
4. A commutative production system.
Advantages of production systems:-
1. Production systems provide an excellent tool for structuring AI programs.
2. Production Systems are highly modular because the individual rules can be added, removed or
modified independently.
3. The production rules are expressed in a natural form, so the statements contained in the knowledge
base should the a recording of an expert thinking out loud.
Disadvantages of Production Systems:-
One important disadvantage is the fact that it may be very difficult to analyse the flow of control
within a production system because the individual rules don’t call each other.
Production systems describe the operations that can be performed in a search for a solution to the
problem. They can be classified as follows.
Monotonic production system:-
A system in which the application of a rule never prevents the later application of another rule, that
could have also been applied at the time the first rule was selected.
Partially commutative production system:-
A production system in which the application of a particular sequence of rules transforms state X
into state Y, then any permutation of those rules that is allowable also transforms state x into state Y.
Theorem proving falls under monotonic partially communicative system. Blocks world and 8 puzzle
problems like chemical analysis and synthesis come under monotonic, not partially commutative
systems. Playing the game of bridge comes under non monotonic , not partially commutative system.
For any problem, several production systems exist. Some will be efficient than others. Though it may
seem that there is no relationship between kinds of problems and kinds of production systems, in
practice there is a definite relationship.
Partially commutative , monotonic production systems are useful for solving ignorable problems.
These systems are important for man implementation standpoint because they can be implemented
without the ability to backtrack to previous states, when it is discovered that an incorrect path was
followed. Such systems increase the efficiency since it is not necessary to keep track of the changes
made in the search process.
Monotonic partially commutative systems are useful for problems in which changes occur but can
be reversed and in which the order of operation is not critical (ex: 8 puzzle problem).
Production systems that are not partially commutative are useful for many problems in which
irreversible changes occur, such as chemical analysis. When dealing with such systems, the order in
which operations are performed is very important and hence correct decisions have to be made at the
first time itself.
Frame Based System
A frame is a data structure with typical knowledge about a particular object or concept. Frames,
first proposed by Marvin Minsky in the 1970s.
Example : Boarding pass frames
 Each frame has its own name and a set of attributes associated with it. Name, weight, height
and age are slots in the frame Person. Model, processor, memory and price are slots in the
frame Computer. Each attribute or slot has a value attached to it.
 Frames provide a natural way for the structured and concise representation of knowledge.
 A frame provides a means of organising knowledge in slots to describe various attributes and
characteristics of the object.
 Frames are an application of object-oriented programming for expert systems.
 Object-oriented programming is a programming method that uses objects as a basis for analysis,
design and implementation.
 In object-oriented programming, an object is defined as a concept, abstraction or thing with
crisp boundaries and meaning for the problem at hand. All objects have identity and are clearly
distinguishable. Michael Black, Audi 5000 Turbo, IBM Aptiva S35 are examples of objects.
 An object combines both data structure and its behavior in a single entity. This is in sharp
contrast to conventional programming, in which data structure and the program behavior have
concealed or vague connections.
 When an object is created in an object-oriented programming language, we first assign a name
to the object, then determine a set of attributes to describe the object’s characteristics, and at
last write procedures to specify the object’s behavior.
 A knowledge engineer refers to an object as a frame (the term, which has become the AI
jargon).
Frames as a knowledge representation technique

 The concept of a frame is defined by a collection of slots. Each slot describes a particular
attribute or operation of the frame.
 Slots are used to store values. A slot may contain a default value or a pointer to another frame,
a set of rules or procedure by which the slot value is obtained.
Typical information included in a slot Frame name.
• Relationship of the frame to the other frames. The frame IBM Aptiva S35 might be a member of
the class Computer, which in turn might belong to the class Hardware.
Slot value. A slot value can be symbolic, numeric or Boolean. For example, the slot Name has
symbolic values, and the slot Age numeric values. Slot values can be assigned when the frame is created
or during a session with the expert system.
Default slot value. The default value is taken to be true when no evidence to the contrary has been
found. For example, a car frame might have four wheels and a chair frame four legs as default values in
the corresponding slots.
Range of the slot value. The range of the slot value determines whether a particular object complies
with the stereotype requirements defined by the frame. For example, the cost of a computer might be
specified between $750 and $1500.
Procedural information. A slot can have a procedure attached to it, which is executed if the slot
value is changed or needed. Most frame based expert systems use two types of methods:
WHEN CHANGED and WHEN NEEDED
A WHEN CHANGED method is executed immediately when the value of its attribute changes. A
WHEN NEEDED method is used to obtain the attribute value only when it is needed.
A WHEN NEEDED method is executed when information associated with a particular attribute is
needed for solving the problem, but the attribute value is undetermined.
Most frame based expert systems allow us to use a set of rules to evaluate information
contained in frames.
How does an inference engine work in a frame based system?
In a rule based system, the inference engine links the rules contained in the knowledge base
with data given in the database.
When the goal is set up, the inference engine searches the knowledge base to find a rule that
has the goal in its consequent.
If such a rule is found and its IF part matches data in the database, the rule is fired and the
specified object, the goal, obtains its value. If no rules are found that can derive a value for the goal, the
system queries the user to supply that value.
In a frame based system, the inference engine also searches for the goal. But In a frame based
system, rules play an auxiliary role. Frames represent here a major source of knowledge and both
methods and demons are used to add actions to the frames.
Thus the goal in a frame based system can be established either in a method or in a demon.
Difference between methods and demons:
A demon has an IF-THEN structure. It is executed whenever an attribute in the demon’ s IF
statement changes its value. In this sense, demons and methods are very similar and the two terms are
often used as synonyms.
However, methods are more appropriate if we need to write complex procedures. Demons on
the other hand, are usually limited to IF-THEN statements.
Inference
Two control strategies: forward chaining and backward chaining
Forward chaining:
Working from the facts to a conclusion. Sometimes called the data driven approach. To chain
forward, match data in working memory against 'conditions' of rules in the rule-base. When one of them
fires, this is liable to produce more data. So the cycle continues
Backward chaining:
Working from the conclusion to the facts. Sometimes called the goal-driven approach.
To chain backward, match a goal in working memory against 'conclusions' of rules in the rule-
base. When one of them fires, this is liable to produce more goals. So the cycle continues.
The choice of strategy depends on the nature of the problem. Assume the problem is to get
from facts to a goal (e.g. symptoms to a diagnosis).
Backward chaining is the best choice if:
 The goal is given in the problem statement, or can sensibly be guessed at the beginning of the
consultation; or:
 The system has been built so that it sometimes asks for pieces of data (e.g. "please now do the
gram test on the patient's blood, and tell me the result"), rather than expecting all the facts to
be presented to it.
 This is because (especially in the medical domain) the test may be expensive, or unpleasant, or
dangerous for the human participant so one would want to avoid doing such a test unless there
was a good reason for it.
Forward chaining is the best choice if:
 All the facts are provided with the problem statement; or:
 There are many possible goals, and a smaller number of patterns of data; or:
 There isn't any sensible way to guess what the goal is at the beginning of the consultation.
Note also that a backwards-chaining system tends to produce a sequence of questions which seems
focused and logical to the user, a forward-chaining system tends to produce a sequence which
seems random & unconnected. If it is important that the system should seem to behave like a
human expert, backward chaining is probably the best choice.
Forward Chaining Algorithm

Forward chaining is a techniques for drawing inferences from Rule base. Forward-chaining
inference is often called data driven.
The algorithm proceeds from a given situation to a desired goal, adding new assertions (facts)
found.
A forward-chaining, system compares data in the working memory against the conditions in the
IF parts of the rules and determines which rule to fire.
Data Driven
Example: Forward Channing
■ Given: A Rule base contains following Rule set Rule 1: If A and C Then F
Rule 2: If A and E Then G Rule 3: If B Then E
Rule 4: If G Then D
■ Problem: Prove
If A and B true Then D is true Solution:
(i) Start with input given A, B is true and then start at Rule 1 and go forward/down till a rule “fires''
is found.
First iteration :
(ii) Rule 3 fires : conclusion E is true new knowledge found
(iii) No other rule fires; end of first iteration.
(iv) Goal not found; new knowledge found at (ii); go for second iteration Second iteration:
(v) Rule 2 fires : conclusion G is true new knowledge found
(vi) Rule 4 fires : conclusion D is true Goal found;
Proved.
Backward Chaining Algorithm
 Backward chaining is a techniques for drawing inferences from Rule base. Backward-chaining
inference is often called goal driven.
 The algorithm proceeds from desired goal, adding new assertions found.
 A backward-chaining, system looks for the action in the THEN clause of the rules that matches
the specified goal.
Goal Driven
Example : Backward Channing
■ Given : Rule base contains following Rule set
Rule 1: If A and C Then F
Rule 2: If A and E Then G Rule 3: If B Then E
Rule 4: If G Then D
■ Problem : Prove
If A and B true Then D is true Solution:
(i) Start with goal ie D is true go backward/up till a rule "fires'' is found.
First iteration:
(ii) Rule 4 fires: new sub goal to prove G is true go backward
(iii) Rule 2 "fires''; conclusion: A is true new sub goal to prove E is true go backward;
(iv) no other rule fires; end of first iteration. new sub goal found at (iii)go for second iteration Second
iteration :
(v) Rule 3 fires:
Conclusion B is true (2nd input found) both inputs A and B ascertained Proved.
FUZZY LOGIC
Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of FL
imitates the way of decision making in humans that involves all intermediate possibilities between
digital values YES and NO.
The conventional logic block that a computer can understand takes precise input and produces a
definite output as TRUE or FALSE, which is equivalent to human‟s YES or NO.
The inventor of fuzzy logic, Lotfi Zadeh, observed that unlike computers, the human decision
making includes a range of possibilities between YES and NO, such as –
The fuzzy logic works on the levels of possibilities of input to achieve the definite output.
Implementation
 It can be implemented in systems with various sizes and capabilities ranging from small
micro-controllers to large, networked, workstation-based control systems.
 It can be implemented in hardware, software, or a combination of both.
Why Fuzzy Logic?

Fuzzy logic is useful for commercial and practical purposes.
 It can control machines and consumer products.
 It may not give accurate reasoning, but acceptable reasoning.
 Fuzzy logic helps to deal with the uncertainty in engineering.
Fuzzy Logic Systems Architecture:
It has four main parts as shown Fuzzification Module − It transforms the system inputs, which are
crisp numbers, into fuzzy sets. It splits the input signal into five steps such as
LP x is Large Positive
MP x is Medium Positive
S x is Small
MN x is Medium Negative LN x is Large Negative
 Knowledge Base − It stores IF-THEN rules provided by experts.
 Inference Engine − It simulates the human reasoning process by making fuzzy inference on the
inputs and IF-THEN rules.
 Defuzzification Module − It transforms the fuzzy set obtained by the inference engine into a crisp
value.
 The membership functions work on fuzzy sets of variables.

 Membership Function
 Membership functions allow you to quantify linguistic term and represent a fuzzy set
 graphically. A membership function for a fuzzy set A on the universe of discourse X is defined as
μA:X → [0,1].
 Here, each element of X is mapped to a value between 0 and 1. It is called membership value or
degree of membership. It quantifies the degree of membership of the element in X to the fuzzy
set A.
x axis represents the universe of discourse.
y axis represents the degrees of membership in the [0, 1] interval.
There can be multiple membership functions applicable to fuzzify a numerical value.
Simple membership functions are used as use of complex functions does not add more precision
in the output.
All membership functions for LP, MP, S, MN, and LN are shown as below:
The triangular membership function shapes are most common among various other
membership function shapes such as trapezoidal, singleton, and Gaussian.
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the corresponding
output also changes.
Example of a Fuzzy Logic System
Let us consider an air conditioning system with 5-lvel fuzzy logic system. This system adjusts the
temperature of air conditioner by comparing the room temperature and the target temperature value.
Algorithm
 Define linguistic variables and terms.
 Construct membership functions for them.
 Construct knowledge base of rules.
 Convert crisp data into fuzzy data sets using membership functions. (fuzzification)
 Evaluate rules in the rule base. (interface engine)
 Combine results from each rule. (interface engine)
 Convert output data into non-fuzzy values. (defuzzification)
Logic Development
Step 1: Define linguistic variables and terms

Linguistic variables are input and output variables in the form of simple words or sentences. For room
temperature, cold, warm, hot, etc., are linguistic terms.
Temperature (t) = {very-cold, cold, warm, very-warm, hot}. Every member of this set is a
linguistic term and it can cover some portion of overall temperature values.
Step 2: Construct membership functions for them
The membership functions of temperature variable are as shown –
Step3: Construct knowledge base rules

Create a matrix of room temperature values versus target temperature values that an air conditioning
system is expected to provide.
Build a set of rules into the knowledge base in the form of IF-THEN-ELSE structures.
Step 5: Perform defuzzification
Defuzzification is then performed according to membership function for output variable.

Application Areas of Fuzzy Logic
The key application areas of fuzzy logic are as given − Automotive Systems
 Automatic Gearboxes
 Four-Wheel Steering
 Vehicle environment control Consumer Electronic Goods
 Hi-Fi Systems
 Photocopiers
 Still and Video Cameras
 Television
Domestic Goods
 Microwave Ovens
 Refrigerators
 Toasters
 Vacuum Cleaners
 Washing Machines
Environment Control
 Air Conditioners/Dryers/Heaters
 Humidifiers
Advantages of FLSs
 Mathematical concepts within fuzzy reasoning are very simple.

 You can modify a FLS by just adding or deleting rules due to flexibility of fuzzy logic.
 Fuzzy logic Systems can take imprecise, distorted, noisy input information.
 FLSs are easy to construct and understand.
 Fuzzy logic is a solution to complex problems in all fields of life, including medicine, as it
resembles human reasoning and decision making.
Disadvantages of FLSs
 There is no systematic approach to fuzzy system designing.

 They are understandable only when simple.
 They are suitable for the problems which do not need high accuracy.
Certainty Factor
A certainty factor (CF) is a numerical value that expresses a degree of subjective belief that a
particular item is true. The item may be a fact or a rule. When probabilities are used attention must be
paid to the underlying assumptions and probability distributions in order to show validity. Bayes’rule can
be used to combine probability measures.
Suppose that a certainty is defined to be a real number between -1.0 and +1.0, where 1.0
represents complete certainty that an item is true and -1.0 represents complete certainty that an item is
false. Here a CF of 0.0 indicates that no information is available about either the truth or the falsity of an
item. Hence positive values indicate a degree of belief or evidence that an item is true, and negative
values indicate the opposite belief. Moreover it is common to select a positive number that represents a
minimum threshold of belief in the truth of an item. For example, 0.2 is a commonly chosen threshold
value.
Form of certainty factors in ES IF <evidence>
THEN <hypothesis> {cf }
cf represents belief in hypothesis H given that evidence E has occurred It is based on 2 functions
i) Measure of belief MB(H, E)
ii) Measure of disbelief MD(H, E)
Indicate the degree to which belief/disbelief of hypothesis H is increased if evidence E were observed.
Total strength of belief and disbelief in a hypothesis:
Bayesian networks
 Represent dependencies among random variables
 Give a short specification of conditional probability distribution
 Many random variables are conditionally independent
 Simplifies computations
 Graphical representation
 DAG – causal relationships among random variables
 Allows inferences based on the network structure 
Definition of Bayesian networks
A BN is a DAG in which each node is annotated with quantitative probability information,
namely:
 Nodes represent random variables (discrete or continuous)
 Directed links X Y: X has a direct influence on Y, X is said to be a parent of Y
 Each node X has an associated conditional probability table, P(Xi | Parents(Xi)) that quantify
the effects of the parents on the node
Example: Weather, Cavity, Toothache, Catch

 Weather, Cavity
 Toothache, Cavity
 Catch
Example:
Bayesian network semantics

A) Represent a probability distribution
B) Specify conditional independence – build the network
C) Each value of the probability distribution can be computed as:
P(X1=x1 … Xn=xn) = P(x1,…, xn) = i=1,n P(xi | Parents(xi)) where Parents(xi) represent the specific
values of Parents(Xi)
Building the network

P(X1=x1 … Xn=xn) = P(x1,…, xn) = P(xn | xn-1,…, x1) * P(xn-1,…, x1) =
P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) = i=1,n P(xi | xi-1,…, x1)
 We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if Parents(Xi) X1} Xi-1,…,
 The condition may be satisfied by labeling the nodes in an order consistent with a DAG
 Intuitively, the parents of a node Xi must be all the nodes Xi-1,…, X1 which have a direct
influence on Xi.
 Pick a set of random variables that describe the problem
 Pick an ordering of those variables
 while there are still variables repeat
(a) choose a variable Xi and add a node associated to X i
(b) assign Parents(Xi) a minimal set of nodes that already exists in the network such that the
conditional independence property is satisfied
(c) define the conditional probability table for Xi
 Because each node is linked only to previous nodes DAG
 P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) = P(MaryCalls | Alarm)
Compactness of node ordering

 Far more compact than a probability distribution
 Example of locally structured system (or sparse): each component interacts directly only
with a limited number of other components
 Associated usually with a linear growth in complexity rather than with an exponential one
 The order of adding the nodes is important
 The correct order in which to add nodes is to add the “root causes” first, then the variables
they influence, and so on, until we reach the leaves
Probabilistic Interfaces
Different types of inferences

 Diagnosis inferences (effect and cause) P(Burglary | JohnCalls)
 Causal inferences (cause and effect) P(JohnCalls |Burglary), P(MaryCalls | Burgalry)
 Inter causal inferences (between cause and common effects) P(Burglary | Alarm and
Earthquake)
 Mixed inferences
Dempster-Shafer Theory
 Dempster-Shafer theory is an approach to combining evidence
 Dempster (1967) developed means for combining degrees of belief derived from independent
items of evidence.
 His student, Glenn Shafer (1976), developed method for obtaining degrees of belief for one
question from subjective probabilities for a related question
 People working in Expert Systems in the 1980s saw their approach as ideally suitable for such
systems.
 Each fact has a degree of support, between 0 and 1:
 0 No support for the fact
 1 full support for the fact
 Differs from Bayesian approach in that:
 Belief in a fact and its negation need not sum to 1.
 Both values can be 0 (meaning no evidence for or against the fact)
Set of possible conclusions:

Θ = { θ1 , θ2 , …, θn } Where:
 Θ is the set of possible conclusions to be drawn
 Each θi is mutually exclusive: at most one has to be true.
 Θ is Exhaustive: At least one θi has to be true.
Frame of discernment
Θ = { θ1 , θ2 , …, θn }
 Bayes was concerned with evidence that supported single conclusions (e.g., evidence for
each outcome θi in Θ): p(θi| E)
 D-S Theoryis concerned with evidences which support
 Subsets of outcomes in Θ, e.g., θ1 v θ2 v θ3 or {θ1, θ2, θ3}
 The “frame of discernment” (or “Power set”) of Θ is the set of all possible subsets of Θ:
E.g., if Θ = {θ1, θ2, θ3}
 Then the frame of discernment of Θ is: ( Ø, θ1, θ2, θ3, {θ1, θ2}, {θ1, θ3}, {θ2, θ3},{ θ1, θ2, θ3} )
 Ø, the empty set, has a probability of 0, since one of the outcomes has to be true.
 Each of the other elements in the power set has a probability between 0 and 1.
 The probability of {θ1, θ2, θ3} is 1.0 since one has to be true.
Mass function m(A):
 (where A is a member of the power set) = proportion of all evidence that supports this element
of the power set.
 “The mass m(A) of a given member of the power set, A, expresses the proportion of all relevant
and available evidence that supports the claim that the actual state belongs to A but to no
particular subset of A.”
 “The value of m(A) pertains only to the set A and makes no additional claims about any subsets
of A, each of which has, by definition, its own mass.
 Each m(A) is between 0 and 1.
 All m(A) sum to 1.
 m(Ø) is 0 - at least one must be true. Interpretation of m({AvB})=0.3
 Means there is evidence for {AvB} that cannot be divided among more specific beliefs for A or B.
Example
 people (B, J, S and K) are locked in a room when the lights go out.
 When the lights come on, K is dead, stabbed with a knife.
 Not suicide (stabbed in the back)
 No-one entered the room.
 Assume only one killer.
 Θ = { B, J, S}
 P(Θ) = (Ø, {B}, {J}, {S}, {B,J}, {B,S}, {J,S}, {B,J,S} )
 Detectives, after reviewing the crime-scene, assign mass probabilities to various elements of the
power set:
Belief in A:
The belief in an element A of the Power set is the sum of the masses of elements which are subsets of A
(including A itself). Given A={q1, q2, q3}
Bel(A) = m(q1)+m(q2)+m(q3)+ m({q1, q2})+m({q2, q3})+m({q1, q3})+m({q1, q2, q3})
Example
 Given the mass assignments as assigned by the detectives:
bel({B}) = m({B}) = 0.1

bel({B,J}) = m({B})+m({J})+m({B,J}) =0.1+0.2+0.1=0.4
Result:
Plausibility of A: pl(A)
The plausability of an element A, pl(A), is the sum of all the masses of the sets that intersect
with the set A:
E.g. pl({B,J}) = m(B)+m(J)+m(B,J)+m(B,S) +m(J,S)+m(B,J,S) = 0.9
All Results:
Disbelief (or Doubt) in A: dis(A)

 The disbelief in A is simply bel(¬A).
 It is calculated by summing all masses of elements which do not intersect with A. The plausibility
of A is thus 1-dis(A):
pl(A) = 1- dis(A)
Belief Interval of A:
 The certainty associated with a given subset A is defined by the belief interval: [ bel(A) pl(A) ]
E.g. the belief interval of {B,S} is: [0.1 0.8]
Belief Intervals & Probability

The probability in A falls somewhere between bel(A) and pl(A).
 bel(A) represents the evidence we have for A directly So prob(A) cannot be less than this value.
 pl(A) represents the maximum share of the evidence we could possibly have, if, for all sets that
intersect with A, the part that intersects is actually valid. So pl(A) is the maximum possible value
of prob(A).
Belief Intervals:
 Belief intervals allow Demspter-Shafer theory to reason about the degree of certainty or
certainty of our beliefs.
 A small difference between belief and plausibility shows that we are certain about our
belief.
 A large difference shows that we are uncertain about our belief.
 However, even with a 0 interval, this does not mean we know which conclusion is right. Just
how probable it is!
UNIT –III MACHINE LEARNING BASICS
LEARNING
Learning is what gives us flexibility in our life; the fact that we can adjust and adapt to
new circumstances, and learn new tricks.
The important parts of animal learning are remembering, adapting, and generalising:
recognising that last time we were in this situation (saw this data) we tried out some particular
action (gave this output) and it worked (was correct), so we’ll try it again, or it didn’t work, so
we’ll try something different.
The last word, generalising, is about recognising similarity between different situations,
so that things that applied in one place can be used in another. This is what makes learning
useful, because we can use our knowledge in lots of different places.
MACHINE LEARNING
Machine learning is about making computers modify or adapt their actions (whether these
actions are making predictions, or controlling a robot) so that these actions get more accurate,
where accuracy is measured by how well the chosen actions reflect the correct ones.
Imagine that you are playing a game against a computer. You might beat it every time in
the beginning, but after lots of games it starts beating you, until finally you never win. Either you
are getting worse, or the computer is learning how to win
DESIGNING A LEARNING SYSTEM

In order to illustrate some of the basic design issues and approaches to machine learning, let us
consider designing a program to learn to play checkers
Choosing the Training Experience
1. One key attribute is whether the training experience provides direct or indirect feedback
regarding the choices made by the performance system. For example
 The system might learn from direct training examples consisting of individual checkers
board states and the correct move for each
 indirect information consisting of the move sequences and final outcomes of various
games played.
 the learner faces an additional problem of credit assignment, or determining the degree to
which each move in the sequence deserves credit or blame for the final outcome. Credit
assignment can
be a particularly difficult problem because the game can be lost even when early moves are
optimal, if these are followed later by poor moves. Hence, learning from direct training feedback
is typically easier than learning from indirect feedback.
2. A second important attribute of the training experience is the degree to which the learner
controls the sequence of training examples. For example, the learner might rely on the
teacher to select informative board states and to provide the correct move for each.
Alternatively, the learner might itself propose board states that it finds particularly confusing
and ask the teacher for the correct move. Or the learner may have complete control over both
the board states and (indirect) training classifications, as it does when it learns by playing
against itself with no teacher present.
3. A third important attribute of the training experience is how well it represents the distribution
of examples over which the final system performance P must be measured. In general,
learning is most reliable when the training examples follow a distribution similar to that of
future test examples. In our checkers learning scenario, the performance metric P is the
percent of games the system wins in the world tournament. If its training experience E
consists only of games played against itself, there is an obvious danger that this training
experience might not be fully representative of the distribution of situations over which it
will later be tested.
To proceed with our design, let us decide that our system will train by playing games against
itself. This has the advantage that no external trainer need be present, and it therefore allows the
system to generate as much training data as time permits. We now have a fully specified learning
task.
A checkers learning problem:

 Task T: playing checkers
 Performance measure P: percent of games won in the world tournament
 Training experience E: games played against itself
In order to complete the design of the learning system, we must now choose
1. the exact type of knowledge to be, learned
2. a representation for this target knowledge
3. a learning mechanism
Choosing the Target Function

The next design choice is to determine exactly what type of knowledge will be learned
and how this will be used by the performance program.
A checkers-playing program that can generate the legal moves from any board state. The
program needs only to learn how to choose the best move from among these legal moves.
Function ChooseMove and use the notation ChooseMove : B -+ M to indicate that this function
accepts as input any board from the set of legal board states B and produces as output some move
from the set of legal moves M.
It is useful to reduce the problem of improving performance P at task T to the problem of
learning some particular targetfunction such as ChooseMove.
An alternative target function and one that will turn out to be easier to learn in this
setting-is an evaluation function that assigns a numerical score to any given board state. Let us
call this target function V and again use the notation V : B + 8 to denote that V maps any legal
board state from the set B to some real value (we use 8 to denote the set of real numbers). We
intend for this target function V to assign higher scores to better board states.
This target function V to assign higher scores to better board states. If the system can
successfully learn such a target function V, then it can easily use it to select the best move from
any current board position.
This can be accomplished by generating the successor board state produced by every legal move,
then using V to choose the best successor state and therefore the best legal move. Define the
target value V(b) for an arbitrary board state b in B, as follows:
1. if b is a final board state that is won, then V(b) = 100
2. if b is a final board state that is lost, then V(b) = -100
3. if b is a final board state that is drawn, then V(b) = 0
if b is a not a final state in the game, then V(b) = V(bl), where b' is the best final board state that
can be achieved starting from b and playing optimally until the end of the game.
Choosing a Representation for the Target Function
This choice of representation involves a crucial tradeoff. let us choose a simple representation:
for any given board state, the function Ṽ will be calculated as a linear combination of the
following board features:
Choosing a Function Approximation Algorithm

1. ESTIMATING TRAINING VALUES
2. ADJUSTING THE WEIGHTS
Several algorithms are known for finding weights of a linear function that minimize E
defined in this way. In our case, we require an algorithm that will incrementally refine the
weights as new training examples become available and that will be robust to errors in these
estimated training values. One such algorithm is called the least mean squares, or LMS training
rule. The LMS algorithm is defined as follows:
The Final Design
Fig: Final design of the checkers learning program.
The final design of our checkers learning system can be naturally described by four distinct
program modules that represent the central components in many learning systems. These four
modules are as follows:
Fig: Summary of choices in designing the checkers learning program
The Performance System is the module that must solve the given performance task, in
this case playing checkers, by using the learned target function(s). It takes an instance of a new
problem (new game) as input and produces a trace of its solution (game history) as output. In our
case, the strategy used by the Performance System to select its next move at each step is
determined by the learned Ṽ evaluation function. Therefore, we expect its performance to
improve as this evaluation function becomes increasingly accurate.
The Critic takes as input the history or trace of the game and produces as output a set of
training examples of the target function. As shown in the diagram, each training example in this
case corresponds to some game state in the trace, along with an estimate Vtrain of the target
function value for this example.
The Generalizer takes as input the training examples and produces an output hypothesis
that is its estimate of the target function. It generalizes from the specific training examples,
hypothesizing a general function that covers these examples and other cases beyond the training
examples. In our example, the Generalizer corresponds to the LMS algorithm, and the output
hypothesis is the function Ṽ described by the learned weights wo, . . . , W6.
The Experiment Generator takes as input the current hypothesis (currently learned
function) and outputs a new problem (i.e., initial board state) for the Performance System to
explore. Its role is to pick new practice problems that will maximize the learning rate of the
overall system. In our example, the Experiment Generator follows a very simple strategy: It
always proposes the same initial game board to begin a new game. More sophisticated strategies
could involve creating board positions designed to explore particular regions of the state space.
The sequence of design choices made for the checkers program is summarized in the
following Figure. These design choices have constrained the learning task in a number of ways.
We have restricted the type of knowledge that can be acquired to a single linear evaluation
function.
We have constrained this evaluation function to depend on only the six specific board
features provided. If the true target function V can indeed be represented by a linear combination
of these particular features, then our program has a good chance to learn it. If not, then the best
we can hope for is that it will learn a good approximation, since a program can certainly never
learn anything that it cannot at least represent.
PERSPECTIVES AND ISSUES IN MACHINE LEARNING

One useful perspective on machine learning is that it involves searching a very large
space of possible hypotheses to determine one that best fits the observed data and any prior
knowledge held by the learner. For example, consider the space of hypotheses that could in
principle be output by the above checkers learner. This hypothesis space consists of all
evaluation functions that can be represented by some choice of values for the weights wo through
w6.
The learner's task is thus to search through this vast space to locate the hypothesis that is
most consistent with the available training examples. The LMS algorithm for fitting weights
achieves this goal by iteratively tuning the weights, adding a correction to each weight each time
the hypothesized evaluation function predicts a value that differs from the training value. This
algorithm works well when the hypothesis representation considered by the learner defines a
continuously parameterized space of potential hypotheses.
Issues in Machine Learning
Our checkers example raises a number of generic questions about machine learning.
 What algorithms exist for learning general target functions from specific training
examples? In what settings will particular algorithms converge to the desired function,
given sufficient training data? Which algorithms perform best for which types of
problems and representations?
 How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypotheses to the amount of training experience and the character
of the learner's hypothesis space?
 When and how can prior knowledge held by the learner guide the process of generalizing
from examples? Can prior knowledge be helpful even when it is only approximately
correct?
 What is the best strategy for choosing a useful next training experience, and how does the
choice of this strategy alter the complexity of the learning problem?
 What is the best way to reduce the learning task to one or more function approximation
problems? Put another way, what specific functions should the system attempt to learn?
Can this process itself be automated?
 How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?
CONCEPT LEARNING
Much of learning involves acquiring general concepts from specific training examples.
 Each such concept can be viewed as describing some subset of objects or events defined
over a larger set. e.g., the subset of animals that constitute.
 Each concept can be thought of as a boolean-valued function defined over this larger set.
e.g., a function defined over all animals, whose value is true for birds and false for other
animals.
The problem of automatically inferring the general definition of some concept, given examples
labeled as members or nonmembers of the concept. This task is commonly referred to as concept
learning, or approximating a boolean-valued function from examples.
CONCEPT LEARNING: Inferring a boolean-valued function from training examples of its

input and output.
A CONCEPT LEARNING TASK
Consider the example task of learning the target concept
"days on which my friend Aldo enjoys his favorite water sport."

The below table describes a set of example days, each represented by a set of attributes.
The attribute EnjoySport indicates whether or not Aldo enjoys his favorite water sport on this
day. The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the
values of its other attributes.
What hypothesis representation shall we provide to the learner in this case? Let us begin
by considering a simple representation in which each hypothesis consists of a conjunction of
constraints on the instance attributes.
In particular, let each hypothesis be a vector of six constraints, specifying the values of
the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. For each attribute, the
hypothesis will either
If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a

positive example (h(x) = 1). To illustrate, the hypothesis that Aldo enjoys his favorite sport only
on cold days with high humidity (independent of the values of the other attributes) is represented
by the expression (?, Cold, High, ?, ?, ?)
Table: Positive and negative training examples for the target concept Enjoy Sport.
The most general hypothesis-that every day is a positive example-is represented by
(?, ?, ?, ?, ?, ?)
and the most specific possible hypothesis-that no day is a positive example-is represented by
(Ø,Ø,Ø,Ø,Ø,Ø)
To summarize, the Enjoy Sport concept learning task requires learning the set of days for which
Enjoy Sport = yes, describing this set by a conjunction of constraints over the instance attributes.
Notation
The set of items over which the concept is defined is called the set of instances, which we
denote by X. In the current example, X is the set of all possible days, each represented by the
attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. The concept or function to be
learned is called the target concept, which we denote by c. In general, c can be any
booleanvalued function defined over the instances X; that is, c : X + {O, 1). In the current
example, the target concept corresponds to the value of the attribute EnjoySport (i.e., c(x) = 1 if
EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).
Table: The EnjoySport concept learning task.

When learning the target concept, the learner is presented a set of training examples,
each consisting of an instance x from X, along with its target concept value c(x). Instances for
which c(x) = 1 are called positive examples, or members of the target concept.
Instances for which C(X) = 0 are called negative examples, or nonmembers of the target
concept. We will often write the ordered pair (x, c(x)) to describe the training example consisting
of the instance x and its target concept value c(x). We use the symbol D to denote the set of
available training examples.
Given a set of training examples of the target concept c, the problem faced by the learner
is to hypothesize, or estimate, c. We use the symbol H to denote the set of all possible
hypotheses that the learner may consider regarding the identity of the target concept. Usually H
is determined by the human designer's choice of hypothesis representation.
In general, each hypothesis h in H represents a boolean-valued function defined over X;
that is, h : X  {O, 1). The goal of the learner is to find a hypothesis h such
that h(x) = c(x) for a" x in X.
The Inductive Learning Hypothesis

The learning task is to determine a hypothesis h identical to the target concept c over the
entire set of instances X, the only information available about c is its value over the training
examples.
Inductive learning algorithms can at best guarantee that the output hypothesis fits the
target concept over the training data. Lacking any further information, our assumption is that the
best hypothesis regarding unseen instances is the hypothesis that best fits the observed training
data. This is the fundamental assumption of inductive learning,
The inductive learning hypothesis. Any hypothesis found to approximate the target
function well over a sufficiently large set of training examples will also approximate the target
function well over other unobserved examples.
CONCEPT LEARNING AS SEARCH

Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find
the hypothesis that best fits the training examples. It is important to note that by selecting a
hypothesis representation, the designer of the learning algorithm implicitly defines the space of
all hypotheses that the program can ever represent and therefore can ever learn.
Consider, for example, the instances X and hypotheses H in the EnjoySport learning task.
Given that the attribute Sky has three possible values, and that AirTemp, Humidity, Wind,
Water, and Forecast each have two possible values, the instance space X contains exactly 3 .2 2
.2 2 .2 = 96 distinct instances.
A similar calculation shows that there are 5.4-4 -4 -4.4 = 5 120 syntactically distinct
hypotheses within H. Notice, however, that every hypothesis containing one or more Ø symbols
represents the empty set of instances; that is, it classifies every instance as negative.
The number of semantically distinct hypotheses is only 1 + (4.3.3.3.3.3) = 973. Our
EnjoySport example is a very simple learning task, with a relatively small, finite hypothesis
space. Most practical learning tasks involve much larger, sometimes infinite, hypothesis spaces.
General-to-Specific Ordering of Hypotheses

Many algorithms for concept learning organize the search through the hypothesis space
by relying on a very useful structure that exists for any concept learning problem: a general-to-
specific ordering of hypotheses. To illustrate the general-to-specific ordering, consider the two
hypotheses
Now consider the sets of instances that are classified positive by hl and by h2. Because
h2 imposes fewer constraints on the instance, it classifies more instances as positive. In fact, any
instance classified positive by hl will also be classified positive by h2. Therefore, we say that h2
is more general than hl.This intuitive "more general than" relationship between hypotheses can
be defined more precisely as follows.
First, for any instance x in X and hypothesis h in H, we say that x satisjies h if and only if
h(x) = 1. We now define the more-general~han_or.-equal_to relation in terms of the sets of
instances that satisfy the two hypotheses: Given hypotheses hj and hk, hj is more-general-
than_or_ equald_to hk if and only if any instance that satisfies hk also satisfies hj.
Fig:
TYPES OF MACHINE LEARNING
Supervised learning
A training set of examples with the correct responses (targets) is provided and, based on
this training set, the algorithm generalises to respond correctly to all possible inputs. This is also
called learning from exemplars.
Unsupervised learning
Correct responses are not provided, but instead the algorithm tries to identify similarities
between the inputs so that inputs that have something in common are categorised together. The
statistical approach to unsupervised learning is known as density estimation.
Reinforcement learning
This is somewhere between supervised and unsupervised learning. The algorithm gets
told when the answer is wrong, but does not get told how to correct it. It has to explore and try
out different possibilities until it works out how to get the answer right. Reinforcement learning
is sometime called learning with a critic because of this monitor that scores the answer, but does
not suggest improvements.
Evolutionary learning
Biological evolution can be seen as a learning process: biological organisms adapt to
improve their survival rates and chance of having offspring in their environment. We’ll look at
how we can model this in a computer, using an idea of fitness, which corresponds to a score for
how good the current solution is.
SUPERVISED LEARNING
The webpage example is a typical problem for supervised learning. There is a set of data (the
training data) that consists of a set of input data that has target data, which is the answer that the
algorithm should produce, attached. This is usually written as a set of data (xi, ti), where the
inputs are xi, the targets are ti, and the i index suggests that we have lots of pieces of data,
indexed by i running from 1 to some upper limit N.
REGRESSION
The following datapoints and asked you to tell me the value of the output (which we will
call y since it is not a target datapoint) when x = 0.44
Fig :Top left: A few datapoints from a sample problem. Bottom left: Two possible ways
to predict the values between the known datapoints: connecting the points with straight lines, or
using a cubic approximation (which in this case misses all of the points). Top and bottom right:
Two more complex approximators (see the text for details) that pass through the points, although
the lower one is rather better than the top.
Since the value x = 0.44 isn’t in the examples given, you need to find some way to predict
what value it has. You assume that the values come from some sort of function, and try to find
out what the function is. Then you’ll be able to give the output value y for any given value of x.
This is known as a regression problem in statistics: fit a mathematical function describing a
curve, so that the curve passes as close as possible to all of the datapoints. It is generally a
problem of function approximation or interpolation, working out the value between values that
we know.
The top-left plot shows a plot of the 7 values of x and y in the table, while the other plots
show different attempts to fit a curve through the datapoints. The bottom-left plot shows two
possible answers found by using straight lines to connect up the points, and also what happens if
we try to use a cubic function (something that can be written as ax3 +bx2 +cx+d = 0). The top-
right plot shows what happens when we try to match the function using a different polynomial,
this time of the form
and finally the bottom-right plot shows the function y = 3 sin(5x). Which of these
functions would you choose? our machine learning algorithms can do is interpolate between
datapoints. This might not seem to be intelligent behaviour, or even very difficult in two
dimensions, but it is rather harder in higher dimensional spaces.
CLASSIFICATION
The classification problem consists of taking input vectors and deciding which of N
classes they belong to, based on training from exemplars of each class. The most important point
about the classification problem is that it is discrete—each example belongs to precisely one
class, and the set of classes covers the whole possible output space.
Example: Coin classifier
When the coin is pushed into the slot, the machine takes a few measurements of it. These
could include the diameter, the weight, and possibly the shape, and are the features that will
generate our input vector.
Our input vector will have three elements, each of which will be a number showing
the measurement of that feature (choosing a number to represent the shape would involve
an encoding, for example that 1=circle, 2=hexagon, etc.).
There are many other features that we could measure. If our vending machine included an
atomic absorption spectroscope, then we could estimate the density of the material and its
composition, or if it had a camera, we could take a photograph of the coin and feed that image
into the classifier.
Fig: The New Zealand coins.
Fig: Left: A set of straight line decision boundaries for a classification problem. Right: An
alternative set of decision boundaries that separate the plusses from the lightening strikes better,
but requires a line that isn’t straight.
For example,
If we tried to separate coins based only on colour, we wouldn’t get very far, because the
20 ¢ and 50 ¢ coins are both silver and the $1 and $2 coins both bronze. If we use colour and
diameter, we can do a pretty good job of the coin classification problem for NZ coins. There are
some features that are entirely useless.
For example,
Knowing that the coin is circular doesn’t tell us anything about NZ coins, which are all
circular. The above Figure shows a set of 2D inputs with three different classes shown, and two
different decision boundaries; on the left they are straight lines, and are therefore simple, but
don’t categorise as well as the non-linear curve on the right.
TESTING MACHINE LEARNING ALGORITHMS
 Over fitting
 Training
 Testing and Validation Sets
 The Confusion Matrix
 Accuracy Metrics
 ROC Curve
 Unbalanced Datasets
 Measurement Precision
Overfitting
The number of degrees of variability in most machine learning algorithms is huge — for
a neural network there are lots of weights, and each of them can vary. This is undoubtedly more
variation than there is in the function we are learning, so we need to be careful: if we train for too
long, then we will overfit the data, which means that we have learnt about the noise and
inaccuracies in the data as well as the actual function. The following figure shows this by
plotting the predictions of some algorithm (as the curve) at two different points in the learning
process.
Fig: The effect of overfitting is that rather than finding the generating function (as shown on
the left), the neural network matches the inputs perfectly, including the noise in them (on the
right). This reduces the generalisation capabilities of the network.
On the left of the figure the curve fits the overall trend of the data well (it has generalised to
the underlying general function), but the training error would still not be that close to zero since
it passes near, but not through, the training data.
As the network continues to learn, it will eventually produce a much more complex model
that has a lower training error (close to zero), meaning that it has memorised the training
examples, including any noise component of them, so that is has overfitted the training data.
We want to stop the learning process before the algorithm overfits, which means that we
need to know how well it is generalising at each timestep. We can’t use the training data for this,
because we wouldn’t detect overfitting, but we can’t use the testing data either,because we’re
saving that for the final tests.
So we need a third set of data to use for this purpose, which is called the validation set because
we’re using it to validate the learning so far. This is known as cross-validation in statistics. It is
part of model selection: choosing the right parameters for the model so that it generalises as well
as possible.
Training, Testing, and Validation Sets
We need three sets of data:

 the training set to actually train the algorithm,
 the validation set to keep track of how well it is doing as it learns, and
 the test set to produce the final results.
This is becoming expensive in data, especially since for supervised learning it all has to have
target values attached (and even for unsupervised learning, the validation and test sets need
targets so that you have something to compare to) it is not always easy to get accurate labels
(which may well be why you want to learn about the data)
The area of semi-supervised learning attempts to deal with this need for large amounts of
labelled data;
Fig: The dataset is split into different sets, some for training, some for validation, and some for
testing.
If you are really short of training data, so that if you have a separate validation set there is
a worry that the algorithm won’t be sufficiently trained; then it is possible to perform leave-
some-out, multi-fold cross-validation.
The idea is shown in following figure The dataset is randomly partitioned into K subsets,
and one subset is used as a validation set, while the algorithm is trained on all of the others.
Fig: Leave-some-out, multi-fold cross-validation gets around the problem of data shortage by
training many models. It works by splitting the data into sets, training a model on most sets and
holding one out for validation (and another for testing). Different models are trained with different
sets being held out.
A different subset is then left out and a new model is trained on that subset, repeating the
same process for all of the different subsets. Finally, the model that produced the lowest
validation error is tested and used. We’ve traded off data for computation time, since we’ve had
to train K different models instead of just one. In the most extreme case of this there is leave-
one-out cross-validation, where the algorithm is validated on just one piece of data, training on
all of the rest.
The Confusion Matrix

Regardless of how much data we use to test the trained algorithm, we still need to work
out whether or not the result is good.A method that is suitable for classification problems that is
known as the confusion matrix.
The confusion matrix is a nice simple idea: make a square matrix that contains all the
possible classes in both the horizontal and vertical directions and list the classes along the top of
a table as the predicted outputs, and then down the left-hand side as the targets.
For example, the element of the matrix at (i, j) tells us how many input patterns were put
into class i in the targets, but class j by the algorithm. Anything on the leading diagonal (the
diagonal that starts at the top left of the matrix and runs down to the bottom right) is a correct
answer. Suppose that we have three classes: C1,C2, and C3. Now we count the number of times
that the output was class C1 when the target was C1, then when the target was C2, and so on
until we’ve filled in the table:
This table tells us that, for the three classes, most examples were classified correctly, but
two examples of class C3 were misclassified as C1, and so on. For a small number of classes this
is a nice way to look at the outputs. If you just want one number, then it is possible to divide the
sum of the elements on the leading diagonal by the sum of all of the elements in the matrix,
which gives the fraction of correct responses. This is known as the accuracy, and we are about to
see that it is not the last word in evaluating the results of a machine learning algorithm.
Accuracy Metrics
We can do more to analyse the results than just measuring the accuracy. If you consider
the possible outputs of the classes, then they can be arranged in a simple chart like this (where a
true positive is an observation correctly put into class 1, while a false positive is an observation
incorrectly put into class 1, while negative examples (both true and false) are those put into class
2):
The entries on the leading diagonal of this chart are correct and those off the diagonal are
wrong, just as with the confusion matrix. Note, however, that this chart and the concepts of false
positives, etc., are based on binary classification.
Accuracy is then defined as the sum of the number of true positives and true negatives divided
by the total number of examples (where # means ‘number of’, and TP stands for True Positive,
etc.):
The problem with accuracy is that it doesn’t tell us everything about the results, since it
turns four numbers into just one. There are two complementary pairs of measurements that can
help us to interpret the performance of a classifier, namely sensitivity and specificity, and
precision and recall. Their definitions are shown next, followed by some explanation.
Sensitivity (also known as the true positive rate) is the ratio of the number of correct
positive examples to the number classified as positive, while specificity is the same ratio for
negative examples. Precision is the ratio of correct positive examples to the number of actual
positive examples, while recall is the ratio of the number of correct positive examples out of
those that were classified as positive, which is the same as sensitivity.
Sensitivity and specificity sum the columns for the denominator, while precision and
recall sum the first column and the first row, and so miss out some information about how well
the learner does on the negative examples.
If you consider precision and recall, then you can see that they are to some extent
inversely related, in that if the number of false positives increases (meaning that the algorithm is
using a broader definition of that class), then the number of false negatives often decreases, and
vice versa. They can be combined to give a single measure, the F1 measure, which can be
written in terms of precision and recall as:
and in terms of the numbers of false positives, etc. (from which it can be seen that it computes
the mean of the false examples) as:
The Receiver Operator Characteristic (ROC) Curve
We can also compare classifiers – either the same classifier with different learning
parameters, or completely different classifiers.Tthe Receiver Operator Characteristic curve
(almost always known just as the ROC curve) is useful. This is a plot of the percentage of true
positives on the y axis against false positives on the x axis;
Fig: An example of an ROC curve. The diagonal line represents exactly chance, so
anything above the line is better than chance, and the further from the line, the better. Of
the two curves shown, the one that is further away from the diagonal line would represent
a more accurate method.
An example is shown in the above figure. A single run of a classifier produces a single
point on the ROC plot, and a perfect classifier would be a point at (0, 1) (100% true positives,
0% false positives), while the anti-classifier that got everything wrong would be at (1,0); so the
closer to the top-left-hand corner the result of a classifier is, the better the classifier has
performed. Any classifier that sits on the diagonal line from (0,0) to (1,1) behaves exactly at the
chance level (assuming that the positive and negative classes are equally common) and so
presumably a lot of learning effort is wasted since a fair coin would do just as well.
In order to compare classifiers, or choices of parameters settings for the same classifier,
you could just compute the point that is furthest from the ‘chance’ line along the diagonal.
However, it is normal to compute the area under the curve (AUC) instead. If you only have one
point for each classifier, the curve is the trapezoid that runs from (0,0) up to the point and then
from there to (1,1). If there are more points (based on more runs of the classifier, such as trained
and/or tested on different datasets), then they are just included in order along the diagonal line.
The key to getting a curve rather than a point on the ROC curve is to use cross validation.
If you use 10-fold cross-validation, then you have 10 classifiers, with 10 different test sets, and
you also have the ‘ground truth’ labels. The true labels can be used to produce a ranked list of the
different cross-validation-trained results, which can be used to specify a curve through the 10
data points on the ROC curve that correspond to the results of this classifier. By producing an
ROC curve for each classifier it is possible to compare their results.
Unbalanced Datasets
For the accuracy we have implicitly assumed that there are the same number of positive
and negative examples in the dataset (which is known as a balanced dataset).
we can compute the balanced accuracy as the sum of sensitivity and specificity divided
by 2. A more correct measure is Matthew’s Correlation Coefficient, which is computed as:
If any of the brackets in the denominator are 0, then the whole of the denominator is set to 1.
This provides a balanced accuracy computation.
Measurement Precision
The concept here is to treat the machine learning algorithm as a measurement system. We
feed in inputs and look at the outputs that we get. Even before comparing them to the target
values, we can measure something about the algorithm: if we feed in a set of similar inputs, then
we would expect to get similar outputs for them. This measure of the variability of the algorithm
is also known as precision
The point is that just because an algorithm is precise it does not mean that it is accurate –
it can be precisely wrong if it always gives the wrong prediction. One measure of how well the
algorithm’s predictions match reality is known as trueness
It can be defined as the average distance between the correct output and the prediction.
Trueness doesn’t usually make much sense for classification problems unless there is some
concept of certain classes being similar to each other.
Fig: Assuming that the player was aiming for the highest-scoring triple 20 in darts (the segments
each score the number they are labelled with, the narrow band on the outside of the circle scores
double and the narrow band halfway in scores triple; the outer and inner ‘bullseye’ at the centre
score 25 and 50, respectively), these four pictures show different outcomes. Top left: very
accurate: high precision and trueness, top right: low precision, but good trueness, bottom left:
high precision, but low trueness, and bottom right: reasonable trueness and precision, but the
actual outputs are not very good.
The above figure illustrates the idea of trueness and precision in the traditional way: as a
darts game, with four examples with varying trueness and precision for the three darts thrown by
a player.
UNIT – IV NEURAL NETWORKS
THE BRAIN AND THE NEURON
The processing units of the brain, these are nerve cells called neurons. There are lots of
them (100 billion = 10 11 is the figure that is often given) and they come in lots of different types,
depending upon their particular task.
Their general operation is similar in all cases: transmitter chemicals within the fluid of
the brain raise or lower the electrical potential inside the body of the neuron. If this membrane
potential reaches some threshold, the neuron spikes or fires, and a pulse of fixed strength and
duration is sent down the axon. The axons divide (arborise) into connections to many other
neurons, connecting to each of these neurons in a synapse.
Each neuron is typically connected to thousands of other neurons, so that it is estimated
that there are about 100 trillion (= 1014) synapses within the brain. After firing, the neuron must
wait for some time to recover its energy (the refractory period) before it can fire again.
Each neuron can be viewed as a separate processor, performing a very simple
computation: deciding whether or not to fire. This makes the brain a massively parallel computer
made up of 1011 processing elements. If that is all there is to the brain, then we should be able to
model it inside a computer and end up with animal or human intelligence inside a computer.
How does learning occur in the brain?

The principal concept is plasticity: modifying the strength of synaptic connections
between neurons, and creating new connections. We don’t know all of the mechanisms by which
the strength of these synapses gets adapted, but one method that does seem to be used was first
postulated by Donald Hebb in 1949, and that is what is discussed as follows
Hebb’s Rule
Hebb’s rule says that the changes in the strength of synaptic connections are proportional
to the correlation in the firing of the two connecting neurons.
So if two neurons consistently fire simultaneously, then any connection between them will
change in strength, becoming stronger.
If the two neurons never fire simultaneously, the connection between them will die away.
The idea is that if two neurons both respond to something, then they should be connected.
Example:
Suppose that you have a neuron somewhere that recognises your grandmother (this will
probably get input from lots of visual processing neurons, but don’t worry about that). Now if
your grandmother always gives you a chocolate bar when she comes to visit, then some neurons,
which are happy because you like the taste of chocolate, will also be stimulated.
Since these neurons fire at the same time, they will be connected together, and the
connection will get stronger over time. So eventually, the sight of your grandmother, even in a
photo, will be enough to make you think of chocolate. Sound familiar? Pavlov used this idea,
called classical conditioning.
To train his dogs so that when food was shown to the dogs and the bell was rung at the
same time, the neurons for salivating over the food and hearing the bell fired simultaneously, and
so became strongly connected. Over time, the strength of the synapse between the neurons that
responded to hearing the bell and those that caused the salivation reflex was enough that just
hearing the bell caused the salivation neurons to fire in sympathy.
There are other names for this idea that synaptic connections between neurons and
assemblies of neurons can be formed when they fire together and can become stronger. It is also
known as long-term potentiation and neural plasticity, and it does appear to have correlates in
real brains.
McCulloch and Pitts Neurons

A mathematical model of a neuron that was introduced in 1943. The purpose of a mathematical model is
that it extracts only the bare essentials required to accurately represent the entity being studied, removing
all of the extraneous details. McCulloch and Pitts produced a perfect example of this when they modelled
a neuron as:
Fig: A picture of McCulloch and Pitts’ mathematical model of a neuron. The inputs xi are
multiplied by the weights wi, and the neurons sum their values. If this sum is greater than
the threshold θ then the neuron fires; otherwise it does not.
(1) a set of weighted inputs wi that correspond to the synapses

(2) an adder that sums the input signals (equivalent to the membrane of the cell that collects
electrical charge)
(3) an activation function (initially a threshold function) that decides whether the neuron fires
(‘spikes’) for the current inputs
We will use the picture to write down a mathematical description. On the left of the
picture are a set of input nodes (labeled x1, x2, . . . xm). These are given some values, and as an
example we’ll assume that there are three inputs, with x1 = 1, x2 = 0, x3 = 0.5. In real neurons
those inputs come from the outputs of other neurons. So the 0 means that a neuron didn’t fire, the
1 means it did, and the 0.5 has no biological meaning, but never mind. (Actually, this isn’t quite
fair, but it’s a long story and not very relevant.) Each of these other neuronal firings flowed
along a synapse to arrive at our neuron, and those synapses have strengths, called weights. The
strength of the synapse affects the strength of the signal, so we multiply the input by the weight
of the synapse (so we get x1 × w1 and x2 × w2, etc.). Now when all of these signals arrive into
our neuron, it adds them up to see if there is enough strength to make it fire. We’ll write that as,
which just means sum (add up) all the inputs multiplied by their synaptic weights. I have
assumed that there are m of them, where m = 3 in the example. If the synaptic weights are w1 =
1,w2 = −0.5,w3 = −1, then the inputs to our model neuron are h = 1 × 1 + 0 × −0.5 + 0.5 × −1 = 1
+ 0 + −0.5 = 0.5. Now the neuron needs to decide if it is going to fire.
For a real neuron, this is a question of whether the membrane potential is above some
threshold. We’ll pick a threshold value (labelled θ), say θ = 0 as an example. Now, does our
neuron fire? Well, h = 0.5 in the example, and 0.5 > 0, so the neuron does fire, and produces
output 1. If the neuron did not fire, it would produce output 0.
The McCulloch and Pitts neuron is a binary threshold device. It sums up the inputs
(multiplied by the synaptic strengths or weights) and either fires (produces output 1) or does not
fire (produces output 0) depending on whether the input is above some threshold. We can write
the second half of the work of the neuron, the decision about whether or not to fire (which is
known as an activation function), as:
This is a very simple model, but we are going to use these neurons, or very simple
variations on them using slightly different activation functions (that is, we’ll replace the
threshold function with something else) for most of our study of neural networks.
Limitations of the McCulloch and Pitts Neuronal Model

 Real neurons are much more complicated. The inputs to a real neuron are not necessarily
summed linearly: there may be non-linear summations. However, the most noticeable
difference is that real neurons do not output a single output response, but a spike train, that is,
a sequence of pulses, and it is this spike train that encodes information. This means that
neurons don’t actually respond as threshold devices, but produce a graded output in a
continuous way.
 They do still have the transition between firing and not firing, though, but the threshold at
which they fire changes over time. Because neurons are biochemical devices, the amount of
neurotransmitter (which affects how much charge they required to spike, amongst other
things) can vary according to the current state of the organism.
 Furthermore, the neurons are not updated sequentially according to a computer clock, but
update themselves randomly (asynchronously), whereas in many of our models we will
update the neurons according to the clock.
 There are neural network models that are asynchronous, but for our purposes we will stick to
algorithms that are updated by the clock.
 The weights wi can be positive or negative. This corresponds to excitatory and inhibitory
connections that make neurons more likely to fire and less likely to fire, respectively.
 Both of these types of synapses do exist within the brain, but with the McCulloch and Pitts
neurons, the weights can change from positive to negative or vice versa, which has not been
seen biologically—synaptic connections are either excitatory or inhibitory, and never change
from one to the other.
 Real neurons can have synapses that link back to themselves in a feedback loop, but we do
not usually allow that possibility when we make networks of neurons.
NEURAL NETWORKS
One thing that is probably fairly obvious is that one neuron isn’t that interesting. It
doesn’t do very much, except fire or not fire when we give it inputs. In fact, it doesn’t even learn.
If we feed in the same set of inputs over and over again, the output of the neuron never varies—it
either fires or does not. So to make the neuron a little more interesting we need to work out how
to make it learn, and then we need to put sets of neurons together into neural networks so that
they can do something useful.
In order to make a neuron learn, the question that we need to ask is:
How should we change the weights and thresholds of the neurons so that the network gets the
right answer more often?
Our very first neural network, the space-age sounding Perceptron, and see how we can
use it to solve the problem. Once we have worked out the algorithm and how it works, we’ll look
at what it can and cannot do, and then see how statistics can give us insights into learning as
well.
THE PERCEPTRON
The Perceptron is nothing more than a collection of McCulloch and Pitts neurons
together with a set of inputs and some weights to fasten the inputs to the neurons. The network is
shown in following figure. On the left of the figure, shaded in light grey, are the input nodes.
Fig: The Perceptron network, consisting of a set of input nodes (left) connected
to McCulloch and Pitts neurons using weighted connections.
These are not neurons, they are just a nice schematic way of showing how values are fed
into the network, and how many of these input values there are (which is the dimension (number
of elements) in the input vector). They are almost always drawn as circles, just like neurons,
which is rather confusing, so I’ve shaded them a different colour. The neurons are shown on the
right, and you can see both the additive part (shown as a circle) and the thresholder. In practice
nobody bothers to draw the thresholder separately, you just need to remember that it is part of the
neuron.
The neurons in the Perceptron are completely independent of each other: it doesn’t matter
to any neuron what the others are doing, it works out whether or not to fire by multiplying
together its own weights and the input, adding them together, and comparing the result to its own
threshold, regardless of what the other neurons are doing.
Even the weights that go into each neuron are separate for each one, so the only thing
they share is the inputs, since every neuron sees all of the inputs to the network.
In the above figure the number of inputs is the same as the number of neurons, but this
does not have to be the case — in general there will be m inputs and n neurons. The number of
inputs is determined for us by the data, and so is the number of outputs, since we are doing
supervised learning, so we want the Perceptron to learn to reproduce a particular target, that is, a
pattern of firing and non-firing neurons for the given input. We set the values of the input nodes
to match the elements of an input vector and then use Equations
We can do this for all of the neurons, and the result is a pattern. of firing and non-firing
neurons, which looks like a vector of 0s and 1s, so if there are 5 neurons. Then a typical output
pattern could be (0, 1, 0, 0, 1), which means that the second and fifth neurons fired and the others
did not. We compare that pattern to the target, which is our known correct answer for this input,
to identify which neurons got the answer right, and which did not.
There are m weights that are connected to that neuron, one for each of the input nodes. If
we label the neuron that is wrong as k, then the weights that we are interested in are wik, where i
runs from 1 to m. So we know which weights to change, but we still need to work out how to
change the values of those weights.
We compute yk –tk (the difference between the output yk, which is what the neuron did,
and the target for that neuron, tk, which is what the neuron should have done. This is a possible
error function). If it is negative then the neuron should have fired and didn’t, so we make the
weights bigger,and vice versa if it is positive, which we can do by subtracting the error value.
That element of the input could be negative, which would switch the values over; so if we
wanted the neuron to fire we’d need to make the value of the weight negative as well. To get
around this we’ll multiply those two things together to see how we should change the weight:
∆wik = −(yk − tk) × xi, and the new value of the weight is the old value plus this value.
We need to decide how much to change the weight by. This is done by multiplying the
value above by a parameter called the learning rate, usually labelled as ɳ. The value of the
learning rate decides how fast the network learns. the rule for updating a weight wij :
The Learning Rate ɳ

The Equation above tells us how to change the weights, with the parameter ɳ controlling how
much to change the weights by. The cost of having a small learning rate is that the weights need
to see the inputs more often before they change significantly, so that the network takes longer to
learn. it will be more stable and resistant to noise (errors) and inaccuracies in the data. a
moderate learning rate, typically 0.1 < ɳ< 0.4, depending upon how much error we expect in the
inputs.
The Bias Input
The threshold requires an extra parameter that we need to write code for, and it isn’t clear
how we can do that in terms of the weight update that we worked out earlier. Suppose that we fix
the value of the threshold for the neuron at zero. We add an extra input weight to the neuron,
with the value of the input to that weight always being fixed (usually the value of -± is chosen; in
this book I’m going to use -1 to make it stand out, but any non-zero value will do).
When an input of all zeros is given, since the input on that weight is always -1, even
when all the other inputs are zero. This input is called a bias node, and its weights are usually
given a 0 subscript, so that the weight connecting it to the jth neuron is w0j .
The Perceptron Learning Algorithm
The algorithm is separated into two parts: a training phase, and a recall phase. The recall
phase is used after training, and it is the one that should be fast to use, since it will be used far
more often than the training phase. The training phase uses the recall equation, since it has to
work out the activations of the neurons before the error can be calculated and the weights
trained.
Computing the computational complexity of this algorithm is very easy. The recall phase
loops over the neurons, and within that loops over the inputs, so its complexity is O(mn). The
training part does this same thing, but does it for T iterations, so costs O(Tmn).
An Example of Perceptron Learning: Logic Functions (Logical OR)
There are two input nodes (plus the bias input) and there will be one output. The inputs
and the target are given in the table on the left. The right of the figure shows a plot of the
function with the circles as the true outputs, and a cross as the false one. The corresponding
neural network is shown in above Figure.
There are three weights. The algorithm tells us to initialize the weights to small random
numbers, so we’ll pick w0 = −0.05,w1 = −0.02,w2 = 0.02.
Now we feed in the first input, where both inputs are 0: (0, 0). Remember that the input
to the bias weight is always −1, so the value that reaches the neuron is −0.05 × −1 +−0.02 × 0 +
0.02 × 0 = 0.05. This value is above 0, so the neuron fires and the output is 1, which is incorrect
according to the target. The update rule tells us that we need to apply Equation
To each of the weights separately (we’ll pick a value of ɳ = 0.25 for the example):
Now we feed in the next input (0, 1) and compute the output (check that you agree that
the neuron does not fire, but that it should) and then apply the learning rule again:
For the (1, 0) input the answer is already correct (you should check that you agree with
this), so we don’t have to update the weights at all, and the same is true for the (1, 1) input. So
now we’ve been through all of the inputs once. Unfortunately, that doesn’t mean we’ve
finished—not all the answers are correct yet.
Implementation
Written this way in Python syntax, the recall code that is used after training for a set of
nData datapoints arranged in the array inputs
Python’s numerical library NumPy provides an alternative method, because it can easily
multiply arrays and matrices together.
In computer terms, matrices are just two-dimensional arrays. We can write the set of
weights for the network in a matrix by making an np.array that has m + 1 rows (the number of
input nodes + 1 for the bias) and n columns (the number of neurons). Now, the element of the
matrix at location (i, j) contains the weight connecting input i to neuron j, which is what we had
in the code above.
If we have matrices A and B where A is size m × n, then the size of B needs to be n×p,
where p can be any number. The n is called the inner dimension since when we write out the size
of the matrices in the multiplication we get (m × n) × (n × p).
NumPy can do this multiplication for us, using the np.dot() function. So to reproduce the
calculation above, we use (where >>> denotes the Python command line, and so this is code to
be typed in, with the answers provided by the Python interpreter shown afterwards):
The np.array() function makes the NumPy array, which is actually a matrix here, made up
of an array of arrays: each row is a separate array, as you can see from the square brackets within
square brackets.
np.where(condition,x,y), (condition is a logical condition and x and y are values) that

returns a matrix that has value x where condition is true and value y everywhere else.
The entire section of code for the recall function of the Perceptron can be rewritten in two lines
of code as:
Using the np.transpose() function, which swaps the rows and columns over (so using matrix a
above again) we get:
The weight update for the entire network can be done in one line (where eta is the learning
rate,ɳ):
The np.shape() function, which tells you the number of elements in each dimension of the
array. The only things that are needed are to add those extra −1’s onto the input vectors for the
bias node, and to decide what values we should put into the weights to start with. The first of
these can be done using the np.concatenate() function, making a one-dimensional array that
contains -1 as all of its elements, and adding it on to the inputs array (note that nData in the code
is equivalent to N in the text):
The OR example that was used in the hand-worked demonstration. Making the OR data
is easy, and then running the code requires importing it using its filename (pcn) and then calling
the pcntrain function. The print-out below shows the instructions to set up the arrays and call the
function, and the output of the weights for 5 iterations of a particular run of the program, starting
from random initial points (note that the weights stop changing after the 1st iteration in this case,
and that different runs will produce different values).
The following Figure shows the decision boundary, which shows when the decision
about which class to categorise the input as changes from crosses to circles.
Fig: The decision boundary computed by a Perceptron for the OR function.
Before returning the weights, the Perceptron algorithm above prints out the outputs for
the trained inputs. You can also use the network to predict the outputs for other values by using
the pcnfwd function. However, you need to manually add the −1s on in this case, using:
The results on this test data are what you can use in order to compute the accuracy of the
training algorithm using the methods.
LINEAR SEPARABILITY
What the Perceptron does: it tries to find a straight line (in 2D, a plane in 3D, and a
hyperplane in higher dimensions) where the neuron fires on one side of the line, and doesn’t on
the other. This line is called the decision boundary or discriminant function. An example of one
is given in the following Figure.
The matrix notation we used in the implementation, but consider just one input vector x.
The neuron fires if x·wT >=0 (where w is the row of W that connects the inputs to one particular
neuron; they are the same for the OR example, since there is only one neuron, and wT denotes the
transpose of w and is used to make both of the vectors into column vectors). The a · b notation
describes the inner or scalar product between two vectors. It is computed by multiplying each
element of the first vector by the matching element of the second and adding them all together.
As you might remember from high school, a · b = ||a|| ||b|| cos θ, where θ is the angle between a
and b and ||a|| is the length of the vector a. So the inner product computes a function of the angle
between the two vectors, scaled by their lengths. It can be computed in NumPy using the
np.inner() function. Getting back to the Perceptron, the boundary case is where we find an input
vector x1 that has x1 · wT = 0. Now suppose that we find another input vector x2 that satisfies
x2 · wT = 0. Putting these two equations together we get:
What does this last equation mean? In order for the inner product to be 0, either ||a||or ||b||
or cos θ needs to be zero. There is no reason to believe that ||a|| or ||b|| should be 0, so cos θ = 0.
This means that θ = π/2 (or −π/2), which means that the two vectors are at right angles to each
other. Now x1 − x2 is a straight line between two points that lie on the decision boundary, and
the weight vector wT must be perpendicular.
The associated target outputs, the Perceptron simply tries to find a straight line that
divides the examples where each neuron fires from those where it does not. This is great if that
straight line exists, but is a bit of a problem otherwise. The cases where there is a straight line are
called linearly separable cases.
The following Figure shows an example of decision boundaries computed by a
Perceptron with four neurons; by putting them together we can get good separation of the
classes.
Fig: Different decision boundaries computed by a Perceptron with four neurons.
The Perceptron Convergence Theorem

Weight vector w* that separates the data, since we have assumed that it is linearly
separable. The Perceptron learning algorithm aims to find some vector w that is parallel to w* ,
or as close as possible. To see whether two vectors are parallel we use the inner product w* · w.
When the two vectors are parallel, the angle between them is θ = 0 and so cos θ = 1, and so the
size of the inner product is a maximum.If we therefore show that at each weight update w* · w
increases, then we have nearly shown that the algorithm will converge.
When we consider a weight update, there are two checks that we need to make:
The value of w* · w and the length of w.
The Exclusive Or (XOR) Function
The XOR function is not linearly separable. If the analysis above is correct, then the
Perceptron will fail to get the correct answer, and using the Perceptron code above we find:
which gives the following output

The algorithm does not converge, but keeps on cycling through two different wrong
solutions. Running it for longer does not change this behaviour. So even for a simple logical
function, the Perceptron can fail to learn the correct answer.
A Useful Insight
Writing the problem in 3D means including a third input dimension that does not change
the data when it is looked at in the (x, y) plane, but moves the point at (0, 0) along a third
dimension. So the truth table for the function is the one shown on the left side of the following
Figure (where ‘In3’ has been added, and only affects the point at (0, 0)).
Fig: A decision boundary (the shaded plane) solving the XOR problem in 3D with
the crosses below the surface and the circles above it.
To demonstrate this, the following listing uses the same Perceptron code:
The following Figure shows two versions of the same dataset. On the left side, the
coordinates are x1 and x2, while on the right side the coordinates are x1, x2 and x1 ×x2. It is now
easy to fit a plane (the 2D equivalent of a straight line) that separates the data.
Fig: Left: Non-separable 2D dataset. Right: The same dataset with third coordinate
x1 × x2, which makes it separable.
Statistics has been dealing with problems of classification and regression for a long time,
before we had computers in order to do difficult arithmetic for us, and so straight line methods
have been around in statistics for many years. They provide a different (and useful) way to
understand what is happening in learning, and by using both statistical and computer science
methods we can get a good understanding of the whole area.
LINEAR REGRESSION
For regression we are making a prediction about an unknown value y (such as the
indicator variable for classes or a future value of some data) by computing some function of
known values xi. We are thinking about straight lines, so the output y is going to be a sum of the
xi values, each multiplied by a constant parameter:
The βi define a straight line (plane in 3D, hyperplane in higher dimensions) that goes
through (or at least near) the datapoints. The following Figure shows this in two and three
dimensions.
The most common solution is to try to minimise the distance between each datapoint and
the line that we fit. We can measure the distance between a point and a line by defining another
line that goes through the point and hits the line.
We can use Pythagoras’ theorem to know the distance. Now, we can try to minimise an
error function that measures the sum of all these distances. If we ignore the square roots, and just
minimise the sum-of-squares of the errors, then we get the most common minimisation, which is
known as least-squares optimisation.
In order to minimise the squared difference between the prediction and the actual data value,
summed over all of the datapoints. That is, we have:
This can be written in matrix form as:

Linear Regression Examples
Using the linear regressor on the logical OR function seems a rather strange thing to do,
since we are performing classification using a method designed explicitly for regression, trying
to fit a surface to a set of 0 and 1 points. Worse, we will view it as an error if we get say 1.25 and
the output should be 1, so points that are in some sense too correct will receive a penalty!
However, we can do it, and it gives the following outputs:
It might not be clear what this means, but if we threshold the outputs by setting every
value less than 0.5 to 0 and every value above 0.5 to 1, then we get the correct answer. Using it
on the XOR function shows that this is still a linear method:
The linear regressor can’t do much with the names of the cars either, but since they
appear in quotes (") we will tell np.loadtxt that they are comments, using:
Separate the data into training and testing sets, and then use the training set to recover the _
vector. Then you use that to get the predicted values on the test set. However, the confusion
matrix isn’t much use now, since there are no classes to enable us to analyse the results. Instead,
we will use the sum-of-squares error, which consists of computing the difference between the
prediction and the true value, squaring them so that they are all positive, and then adding them
up, as is used in the definition of the linear regressor. Obviously, small values of this measure are
good. It can be computed using:
UNSUPERVISED LEARNING
Unsupervised learning is to find clusters of similar inputs in the data without being
explicitly told that these datapoints belong to one class and those to a different class.
THE K-MEANS ALGORITHM

We want to divide our input data into k categories, where we know the value of k (for
example, we have a set of medical test results from lots of people for three diseases, and we want
to see how well the tests identify the three diseases). We allocate k cluster centres to our input
space, and we would like to position these centres so that there is one cluster centre in the middle
of each cluster. We don’t know where the clusters are, let alone where their ‘middle’ is, so we
need an algorithm that will find them.
How do we define the middle of a set of points? There are actually two things that we
need to define:
A distance measure In order to talk about distances between points, we need some way
to measure distances. It is often the normal Euclidean distance, but there are other alternatives.
The mean average We can compute the central point of a set of datapoints, which is the
mean average (the mean of two numbers is, it is the point halfway along the line between them).
Actually, this is only true in Euclidean space, which is the one you are used to, where everything
is nice and flat.
A suitable way of positioning the cluster centres: we compute the mean point of each
cluster, μc(i), and put the cluster centre there. This is equivalent to minimising the Euclidean
distance (which is the sum-of-squares error again) from each datapoint to its cluster centre.
For all of the points that are assigned to a cluster, we then compute the mean of them, and
move the cluster centre to that place. We iterate the algorithm until the cluster centres stop
moving.
The NumPy implementation follows these steps almost exactly, and we can take
advantage of the np.argmin() function, which returns the index of the minimum value, to find the
closest cluster. The code that computes the distances, finds the nearest cluster centre, and updates
them can then be written as:
The following Figures show some data and some different ways to cluster that data
computed by the k-means algorithm.
The above figure shows examples of what happens when you choose the number of
centres wrongly. There are certainly cases where we don’t know in advance how many clusters
we will see in the data, but the k-means algorithm doesn’t deal with this at all well.
To find a good local optimum (or even the global one) we use many different initial
centre locations, and the solution that minimises the overall sum-of-squares error is likely to be
the best one.
By running the algorithm with lots of different values of k, we can see which values give
us the best solution.
If we still just measure the sum-of-squares error between each datapoint and its nearest
cluster centre, then when we set k to be equal to the number of datapoints, we can position one
centre on every datapoint, and the sum-of-squares error will be zero. There is no generalisation
in this solution: it is a case of serious overfitting.
By computing the error on a validation set and multiplying the error by k we can see
something about the benefit of adding each extra cluster centre.
Dealing with Noise

If we can choose the clusters correctly, then we have effectively removed the noise,
because we replace each noisy datapoint by the cluster centre. the mean average, which is central
to the k-means algorithm, is very susceptible to outliers, i.e., very noisy measurements. One way
to avoid the problem is to replace the mean average with the median, which is what is known as
a robust statistic, meaning that it is not affected by outliers (the mean of (1, 2, 1, 2, 100) is 21.2,
while the median is 2).
The k-Means Neural Network
We can implement the k-means algorithm using a set of neurons. We will use just one
layer of neurons, together with some input nodes, and no bias node. The first layer will be the
inputs, which don’t do any computation, as usual, and the second layer will be a layer of
competitive neurons, that is, neurons that ‘compete’ to fire, with only one of them actually
succeeding. Only one cluster centre can represent a particular input vector, and so we will choose
the neuron with the highest activation h to be the one that fires. This is known as winner-takes-all
activation, and it is an example of competitive learning, since the set of neurons compete with
each other to fire, with the winner being the one that best matches (i.e., is closest to) the input.
We will choose k neurons (for hopefully obvious reasons) and fully connect the inputs to
the neurons, as usual. We will use neurons with a linear transfer function, computing the
activation of the neurons as simply the product of the weights and inputs:
Normalisation
Computing this normalisation in NumPy takes a little bit of care because we are
normalizing the total Euclidean distance from the origin, and the sum and division are row-wise
rather than column-wise, which means that the matrix has to be transposed before and after the
division:
A Better Weight Update Rule
If we normalise the inputs as well, which certainly seems reasonable, then we can use the
following weight update rule:
VECTOR QUANTISATION
What happens when I want to send a datapoint and it isn’t in the codebook? In that case
we need to accept that our data will not look exactly the same, and I send you the index of the
prototype vector that is closest to it (this is known as vector quantisation, and is the way that
lossy compression works).
The following Figure shows an interpretation of prototype vectors in two dimensions.
The dots at the centre of each cell are the prototype vectors, and any datapoint that lies
within a cell is represented by the dot. The name for each cell is the Voronoi set of a particular
prototype. Together, they produce the Voronoi tesselation of the space. If you connect together
every pair of points that share an edge, as is shown by the dotted lines, then you get the Delaunay
triangulation, which is the optimal way to organise the space to perform function approximation.
We need to choose prototype vectors that are as close as possible to all of the possible
inputs that we might see. This application is called learning vector quantization because we are
learning an efficient vector quantisation. The k-means algorithm can be used to solve the
problem if we know how large we want our codebook to be. Another algorithm turns out to be
more useful, the Self-Organising Feature Map.
THE SELF-ORGANISING FEATURE MAP
The most commonly used competitive learning algorithm is the Self-Organising Feature
Map (often abbreviated to SOM).It was considering the question of how sensory signals get
mapped into the cerebral cortex of the brain with an order.
For example, in the auditory cortex, which deals with the sounds that we hear, neurons
that are excited (i.e., that are caused to fire) by similar sounds are positioned closely together,
whereas two neurons that are excited by very different sounds will be far apart.
There are two novel departures
1. The relative locations of the neurons in the network matters (this property is known as
feature mapping—nearby neurons correspond to similar input patterns)
2. The neurons are arranged in a grid with connections between the neurons, rather than
in layers with connections only between the different layers.
In the auditory cortex there appears to be sheets of neurons arranged in 2D, and that is
the typical arrangement of neurons for the SOM: a grid of neurons arranged in 2D, as can
be seen in the following figure
A 1D line of neurons is also sometimes used. In mathematical terms, the SOM

demonstrates relative ordering preservation, which is sometimes known as topology
preservation. The relative ordering of the inputs should be preserved by the ordering in the
neurons, so that neurons that are close together represent inputs that are close together, while
neurons that are far apart represent inputs that are far apart.
A different way to see the same thing is given in the following Figure where mismatches
between the topology of the input space and map lead to changes in the relative ordering. The
best that can be said is that SOM is perfectly topology-preserving, which means that if the
dimensionality of the input and the map correspond, then the topology of the input space will be
preserved.
Neurons that are close together in the map should represent similar features. This means
that the winning neuron should pull other neurons that are close to it in the network closer to
itself in weight space, which means that we need positive connections.
Neurons that are further away should represent different features, and so should be a long
way off in weight space, so the winning neuron ‘repels’ them, by using negative connections to
push them away.
Neurons that are very far away in the network should already represent different features,
so we just ignore them. This is known as the ‘Mexican Hat’ form of lateral connections, for
reasons that should be clear from the picture in the following figure. We can then just use
ordinary competitive learning, just like we did for the k-means network . The Self-Organising
Map does pretty much exactly this.
The SOM Algorithm

In Kohonen’s SOM algorithm, the weight update rule is modified instead, so that
information about neighbouring neurons is included in the learning rule, which makes the
algorithm simpler. The algorithm is a competitive learning algorithm, so that one neuron is
chosen as the winner, but when its weights are updated, so are those of its neighbours, although
to a lesser extent. Neurons that are not within the neighbourhood are ignored, not repelled.
Neighbourhood Connections
The size of the neighbourhood is thus another parameter that we need to control. Once
the network has been learning for a while, the rough ordering has already been created, and the
algorithm starts to fine-tune the individual local regions of the network. At this stage, the
neighbourhoods should be small, as is shown in the following Figure.
It therefore makes sense to reduce the size of the neighbourhood as the network adapts.
These two phases of learning are also known as ordering and convergence. Typically, we reduce
the neighbourhood size by a small amount at each iteration of the algorithm. We control the
learning rate ɳ in exactly the same way, so that it starts off large and decreases over time.
Self-Organisation
A particularly interesting aspect of feature mapping is that we get a global ordering of the
neurons in the network, despite the fact that the interactions are all local, since neurons that are
very far apart do not interact with each other. We thus get a global ordering of the space using
only a set of local interactions, which is amazing. This is known as self-organisation, and it
appears everywhere.
Example: A flock of birds flying in formation The birds cannot possibly know exactly
where each other are, so how do they keep in formation?
If each bird just tries to stay diagonally behind the bird to its right, and fly at the same
speed, then they form perfect flocks, no matter how they start off and what objects are placed in
their way. So the global ordering of the whole flock can arise from the local interactions of each
bird looking to the one on its right (or left).
Network Dimensionality and Boundary Conditions
Applying the SOM algorithm to a 2D rectangular array of neurons, but there is nothing in
the algorithm to force this. There are cases where a line of neurons (1D) works better, or where
three dimensions are needed. It depends on the dimensionality of the inputs (actually on the
intrinsic dimensionality, the number of dimensions that you actually need to represent the data),
not the number that it is embedded in.
Example:
Consider a set of inputs spread through the room you are in, but all on the plane that
connects the bottom of the wall to your left with the top of the wall to your right. These points
have intrinsic dimensionality two since they are all on the plane, but they are embedded in your
three-dimensional room. Noise and other inaccuracies in data often lead to it being represented in
more dimensions than are actually required, and so finding the intrinsic dimensionality can help
to reduce the noise.
We also need to consider the boundaries of the network. It makes sense that the edges of
the map of neurons is strictly defined.
Example:
If we are arranging sounds from low pitch to high pitch, then the lowest and highest pitches we
can hear are obvious endpoints. However, it is not always the case that such boundaries are
clearly defined. In this case we might want to remove the boundary conditions. We can do this
by removing the boundary by tying the ends together. In 1D this means that we turn a line into a
circle, while in 2D we turn a rectangle into a torus. To see this, try taking a piece of paper and
bend it so that the top and bottom edges line up. You’ve now got a tube. If you bend the tube
round so that the two open ends meet up you have a circle of tube known as a torus. Pictures of
these effects are shown in the following Figure
The map distances get more complicated to calculate, since we now need to calculate the
distances allowing for the wrap around. This can be done using modulo arithmetic, but it is easier
to think about taking copies of the map and putting them around the map, so that the original
map has copies of itself all around: one above, one below, to the right and left, and also
diagonally above and below, as is shown in above Figure
Now we keep one of the points in the original map, and the distance to the second node is
the smallest of the distances between the first node and the copies of the second node in the
different maps (including the original). By treating the distances in x and y separately, the
number of distances that has to be computed can be reduced.
The competitive learning algorithm that we considered earlier, the size of the SOM is
defined before we start learning. The size of the network (that is, the number of neurons that we
put into it) decides how fine-grained the learning is. If there are very few neurons, then the best
that the network can do is to find gross generalisations that link the data. However, if there are
very large numbers of neurons, then the network can represent every input without ever needing
to generalise at all.
This is yet another example of overfitting. Clearly, then, choosing the correct size of
network is important. The common approach is to test out several different sizes of network,
such as 5 × 5 and 10 × 10 and see how well the network learns.
Examples of Using the SOM
 Topological ordering of the network

 k-means algorithm
 Self- Organising Map
 UCI Machine Learning repository
1/8/2023
UNIT V
1
1/8/2023
EXPERT SYSTEMS
• An expert system is a computer program that
is designed to solve complex problems and to
provide decision-making ability like a human
expert.
• It performs this by extracting knowledge from
its knowledge base using the reasoning and
inference rules according to the user queries.
EXPERT SYSTEMS
• The expert system is a part of AI, and the first ES was
developed in the year 1970, which was the first successful
approach of artificial intelligence.
• It solves the most complex issue as an expert by extracting
the knowledge stored in its knowledge base.
• The system helps in decision making for compsex problems
using both facts and heuristics like a human expert.
• It is called so because it contains the expert knowledge of a
specific domain and can solve any complex problem of that
particular domain.
• These systems are designed for a specific domain, such
as medicine, science, etc.
2
1/8/2023
Architecture of expert system

• Knowledge Base (KB): repository of special heuristics or rules that
direct the use of knowledge, facts (productions). It contains the
knowledge necessary for understanding, formulating, & problem
solving.
• Working Memory(Blackboard): if forward chaining used
• It describes the current problem & record intermediate results
• Records Intermediate Hypothesis & Decisions: 1. Plan, 2. Agenda, 3.
Solution
• Inference Engine: the deduction system used to infer results from
user input & KB
• It is the brain of the ES, the control structure(rule interpreter)
• It provides methodology for reasoning
3
1/8/2023

• Explanation Subsystem (Justifier): Traces
responsibility & explains the ES behaviour by
interactively answering question: Why?,
How?, What?, Where?, When?, Who?
• User Interface: interfaces with user through
Natural Language Processing (NLP), or menus
& graphics. Acts as Language Processor for
friendly, problem-oriented communication

• Knowledge Acquisitions: It is the process of
extracting, organizing, and structuring the
domain knowledge, specifying the rules to
acquire the knowledge from various experts,
and store that knowledge into the knowledge
base.
4
1/8/2023

• Shell = Inference Engine + User Interface
• The Human Elements in ESs
• Expert: Has the special knowledge, judgement, experience and methods to give
advice and solve problems.
• Provides knowledge about task performance
• Knowledge Engineer: Usually also the System Builder
• Helps the expert(s) structure the problem area by interpreting and integrating
human answers to questions, drawing analogies, posing counter examples, and
bringing to light conceptual difficulties.
• The Expert & the knowledge Engineer should Anticipate Users’ needs & Limitations
when designing Expert Systems
• User: Possible Classes of Users can be
– A non-expert client seeking direct advice (ES acts as a Consultant or Advisor)
– A student who wants to learn (ES acts as an Instructor)
– An ES builder improving or increasing the knowledge base(ES acts as a
Partner)
– An Expert (ES acts as a Colleague or an Assistant)
Characteristics of Expert System

• High Performance: The expert system provides high
performance for solving any type of complex problem
of a specific domain with high efficiency and accuracy.
• Understandable: It responds in a way that can be
easily understandable by the user. It can take input in
human language and provides the output in the same
way.
• Reliable: It is much reliable for generating an efficient
and accurate output.
• Highly responsive: ES provides the result for any
complex query within a very short period of time.
5
1/8/2023
Examples of the Expert System

• DENDRAL: It was an artificial intelligence project that was made as
a chemical analysis expert system. It was used in organic chemistry
to detect unknown organic molecules with the help of their mass
spectra and knowledge base of chemistry.
• MYCIN: It was one of the earliest backward chaining expert systems
that was designed to find the bacteria causing infections like
bacteraemia and meningitis. It was also used for the
recommendation of antibiotics and the diagnosis of blood clotting
diseases.
• PXDES: It is an expert system that is used to determine the type and
level of lung cancer. To determine the disease, it takes a picture
from the upper body, which looks like the shadow. This shadow
identifies the type and degree of harm.
• CaDeT: The CaDet expert system is a diagnostic support system that
can detect cancer at early stages.
6
1/8/2023
7
1/8/2023
8
1/8/2023
9
1/8/2023
10
1/8/2023
11
1/8/2023
12
1/8/2023
13
1/8/2023
14
1/8/2023
15
1/8/2023
Typical Expert Systems

• MYCIN
• DART
• XOON
• Expert System Shell
16
1/8/2023
MYCIN
MYCIN
17
1/8/2023
MYCIN
18
1/8/2023
19
1/8/2023
20
1/8/2023
21
1/8/2023
XOON
• It is an Expert system that finds the

hardware and software that are currently
used in a particular company. • It contains
all the list of software and hardware
configurations to be used and
implemented accordingly. • Its the first
expert system in use.
22
1/8/2023
23
1/8/2023
24
SSLC, HSE, DIPLOMA, B.E/B.TECH, M.E/M.TECH, MBA, MCA
Notes Available @
Syllabus
Question Papers
www.AllAbtEngg.com
Results and Many more…
www.AllAbtEngg.com
Available in /AllAbtEngg Android App too,
Check www.SmartPoet.Net & www.PhotoShip.Net
Notes Available @
Syllabus
Question Papers
www.AllAbtEngg.com
www.AllAbtEngg.com
Notes Available @
Syllabus
Question Papers
www.AllAbtEngg.com
www.AllAbtEngg.com
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
I ililil ilil lllll lllll lllll llll llll Reg. No.:
Question Paper Code, 503 97

B. E.lB.Tech. DEGREE DGMINATION NOVH\{BER/DECEMBER 20 1 7
Fifth/Sixth Semester
Computer Science and Engineering
CS 6659 _ ARTIFICIAL II\NELLIGENCE
@egulations 2013)
(Common to Electronics and Instrumentation Engineering, Instrumentation and
C ontrol En gineerin g, Information Technolo gy)
Time : Three Hours nvyr.recentqucstion prpcr.com \da;imum : 100 Marks

AnswerALL questions
PART _A (10x2=20 Marks)
1. State the advantages of Breadth First Search.
2. What is Commutative production system ?

3. Convert the following into Horn clauses.
Vx: Vy: cat(x) v fish (V) -+ Iikes - to - eat(x, y)

4. Differentiate forward and backward reasoning.
5. Define Fuzzy reasoning.

I
I
6. Compare production based system with frame based system.
7. Defrne ad.aptive learning.

8. What is hierarchical planning ?
9. List the characteristic features of e:<pert system.
L0. What is MOLE ?
frtw.rcccntquertim p.pcrico-rit p.T.o.
STUDENTSFOCUS.COM
50897 -2- Iffitililililffiilffiiltiltiltiltl
PART-B (5xt8=65Marks)
11. a) Explain the following types of HilI Climbing search techniques.
i) Simple Hill Climbing. @,)
ii) Steepest-Ascent HiIl Ctimbing. (E)
iii) Simulated Annealing. (4)
(oR) lvw'recGntquestion paPer'com
b) Discuss Constraint Satisfaction problem with an algorithm for solving a

Cryptarithmeticproblem. (1S)
12. a) Consider the following sentences : (lg)

. John likes all kinds of food
. Apples are food
. Chicken is food
. Anything anyone eats and isn't killed by is food
. BilI eats peanuts and is still alive
. Sue eats everything Bill eats.
i) Translate these sentences intoformulas in predicate logic.
ii) Convert the formulas of part a into clause form.
(oR)
b) Trace the operation of the unification algorithm on each of the following pairs
ofliterals: (lg)
i) f(Marcus) and f(Caesar)
tI*'' recGntquesfi
ii) f(x) and f(g(y)) on paler. cortl
iii) f(Marcus, g(x, y)) and f(x, g(Caesar, Marcus)).
13. a) Explain the production based knowledge representation technique. (1S)
(oR)
b) r) Discuss about Bayesian Theory and Bayesian Network. (6)

ii) Describe in detail about Dempster-Shafer theory. (7)
sda
TF
STUDENTSFOCUS.COM
I rililt ilil illll llffi lllll lill llll -3- 50397
14. a) Write short notes on the

r) Learning by Parameter Adjustment. (4)
ii) Learning with Macro-Operators. (4)
iii) Learning by Chunking. (5)
(oR) ;urw . recentqucstion pa per. tonl'
b) r) Write down STRIPs-styIe operators that corresponds to the following blocks
world description. (8)
oN(e,B,so) n
ONTABLE(B,So) n
CLEAR(e,So)
ii) Write short notes on Nonlinear Planning using Constraint Posting. (5)
15. a) Explain the following expert systems :
i) MYCIN. (7)
ii) DART. (6)
(oR)
b) Explain the expert system architectures : .?

'.($.
i) Rule-based. system architecture. *(S G)
ii) Associative or Semantic NetworkArchitecture. *.oft (S)
iii) Network architecture. (8)

^*dP€-
iv) Blackboard System Architectures. -ft (S)
as$'
PART-C (1x16=15Marks)
16. a) Design an e:pert system for Travel recommendation and discuss its roles.
(oR)
b) Analyse any two machine learning algorithms with an example.

;1
,$'
\,
o
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
www.vidyarthiplus.com
STUDENTSFOCUS.COM
www.Vidyarthiplus.com
STUDENTSFOCUS.COM
STUDENTSFOCUS.COM
Unit -1
PART - A
1. What is AI?
Artificial Intelligence is the branch of computer science concerned with making
computers behave like humans.
 Systems that think like humans
 Systems that act like humans
 Systems that think rationally
 Systems that act rationally
2. Define an agent.
An agent is anything that can be viewed as perceiving its environment through sensors
and acting upon that environment through actuators.
3. What is an agent function? Differentiate an agent function and an agent program.

An agent‟s behaviour is described by the agent function that maps any given percept
sequence to an action.
AGENT FUNCTION AGENT PROGRAM
An abstract mathematical description A concrete implementation, running on the

agent Architecture.
4. What can AI do today?
 Autonomous Planning and Scheduling
 Game Planning
 Autonomous Control
 Diagnosis
 Logistics Planning
 Robotics
5. What is a task environment? How it is specified?
Task environments are essentially the "problems" to which rational agents are the "solutions"
.A Task environment is specified using PEAS (Performance, Environment, Actuators,
and Sensors) description.
1
6. List the properties of task environments.
 Fully observable vs. partially observable.
 Deterministic vs. stochastic.
 Episodic vs sequential
 Static vs dynamic.
 Discrete vs. continuous.
 Single agent vs. multiagent.
7. What are the four different kinds of agent programs?
 Simple reflex agents;
 Model-based reflex agents;
 Goal-based agents; and
 Utility-based agents.
8. Explain goal based reflex agent.
 Knowing about the current state of the environment is not always enough to
decide what to do. For example, at a road junction, the taxi can turn left, turn
right, or go straight on. The correct decision depends on where the taxi is trying
to get to.
 In other words, as well as a current state description, the agent needs some sort
of goal information that describes situations that are desirable-for example,
being at the passenger's destination.
9. What are utility based agents?
 Goals alone are not really enough to generate high-quality behavior in most
environments.
 For example, there are many action sequences that will get the taxi to its
destination (thereby achieving the goal) but some are quicker, safer, more
reliable, or cheaper than others.
 Autilityfunctionmaps a state (or a sequence of states) onto a real number,
which describes the associateddegree of happiness.
10. What are learning agents?
 A learning agent can be divided into four conceptual components. The most
important distinction is between the learning element, which is re-ELEMENT
possible for making improvements, and the performance element, which is
responsible for selecting external actions.
 The performance element is what we have previously considered to be the entire
agent: it takes in percepts and decides on actions. The learning element uses
CRITIC feedback from the critic on how the agent is doing and determines how
the performance element should be modified to do better in the future.
11. Define the problem solving agent.
 A Problem solving agent is a goal-based agent. It decides what to do by finding
sequence of actions that lead to desirable states.
The agent can adopt a goal and aim at satisfying it. Goal formulation is the first
step in problem solving.
2

12. List the steps involved in simple problem solving agent.
 Goal formulation
 Problem formulation
 Search
 Search Algorithm
 Execution phase
13. Define search and search algorithm.
 The process of looking for sequences actions from the current state to reach the
goal state is called search. The search algorithm takes a problem as input
and returns a solution in the form of action sequence.
 Once a solution is found, the execution phase consists of carrying out the
recommended action..
14. What are the components of well-defined problems?
 The initial state that the agent starts in . The initial state for our agent of
example problem is described by In(Arad)
 A Successor Function returns the possible actions available to the agent.
Given a state successor-FN(x) returns a set of {action, successor} ordered
pairs where each
 action is one of the legal actions in state x,and each successor is a state that can be
reached from x by applying the action.
 For example, from the state In(Arad),the successor function for the Romania
problem would return
 { [Go(Sibiu),In(Sibiu)],[Go(Timisoara),In(Timisoara)],[Go(Zerind),In(Zerind)] }
 The goal test determines whether the given state is a goal state.
 A path cost function assigns numeric cost to each action. For the Romania
problem the cost of path might be its length in kilometers.
15. Give examples of real world problems.
a) Touring problems
b) Travelling Salesperson Problem(TSP)
c) VLSI layout
d) Robot navigation
e) Automatic assembly sequencing
f) Internet searching
16. List the criteria to measure the performance of different search strategies.
 Completeness: Is the algorithm guaranteed to find a solution when there is one?
 Optimality: Does the strategy find the optimal solution?
 Time complexity: How long does it take to find a solution?
 Space complexity: How much memory is needed to perform the search?
17. Define Best-first-search.
Best-first search is an instance of the general TREE-SEARCH or GRAPH-SEARCH
algorithm in which a node is selected for expansion based on the evaluation
function f(n ). Traditionally, the node with the lowest evaluation function is selected
for expansion.
3
18. What is a heuristic function? (Nov/Dec 2016)
 A heuristic function or simply a heuristic is a function that ranks alternatives in
various search algorithms at each branching step basing on available information
in order to make a decision which branch is to be followed during a search.
 For example, for shortest path problems, a heuristic is a function, h (n) defined
on the nodes of a search tree, which serves as an estimate of the cost of the
cheapest path from that node to the goal node. Heuristics are used by informed
search algorithms such as Greedy best-first search and A* to choose the best
node to explore.
19. What are relaxed problems?
 A problem with fewer restrictions on the actions is called a relaxed problem
 The cost of an optimal solution to a relaxed problem is an admissible heuristic for
the original problem
 If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then hoop(n)
gives the shortest solution
 If the rules are relaxed so that a tile can move to any adjacent square, then
hmd(n) gives the shortest solution
20. What are categories of production system? (Nov/Dec 2016)
e Monotonic (Characteristics) Non-monotonic
Partially commutative Theorem proving Robot navigation
Non-partial commutative Chemical synthesis Bridge game
21. What is A* search?
A* search is the most widely-known form of best-first search. It evaluates the nodes
bycombiningg(n),the cost to reach the node, and h(n),the cost to get from the node
to the goal:
f (n) = g(n) + h(n)
Where f (n) = estimated cost of the cheapest solution
through n. g (n) is the path cost from the start node to
node n.
h (n) = heuristic function
A* search is both complete and optimal.
22. What is Recursive best-first search?
Recursive best-first search is a simple recursive algorithm that attempts to mimic
the operation ofstandard best-first search, but using only linear space.
23. What are local search algorithms?
 Local search algorithms operate using a single current state (rather than
multiple paths) andgenerally move only to neighbors of that state. The local
search algorithms are not systematic.
 The key two advantages are (i) they use very little memory – usually a constant
amount, and (ii) they can often find reasonable solutions in large or infinite
(continuous) state spaces for which systematic algorithms are unsuitable.
4
24. What are the advantages of local search?
 Use very little memory – usually a constant amount
 Can often find reasonable solutions in large or infinite state spaces (e.g.,
continuous) o Unsuitable for systematic search
 Useful for pure optimization problems
 Find the best state according to an objective function oTraveling salesman
25. What are optimization problems?
In optimization problems, the aim is to find the best state according to an
objective function the optimization problem is then: Find values of the variables
that minimize or maximize the objective function while satisfying the constraints.
26. What is Hill-climbing search?
 The Hill-climbing algorithm is simply a loop that continually moves in the
direction of increasing value –that is uphill. It terminates when it reaches a “peak”
where no neighbor has a higher value.
 The algorithm does not maintain a search tree so the current node data structure
need only record the state and its objective function value. Hill-climbing does not
look ahead beyond the immediate neighbors of the current state.
27. What is the problem faced by hill-climbing search? (May 2010)
Hill-climbing often get stuck for the following reasons:
 Local maxima –A local maxima is a peak that is higher than each of
itsneighboring states, but lower than the local maximum. Hill climbing algorithm
that reach the vicinity of a local maximum will be drawn upwards towards the
peak, but will then be stuck with nowhere else to go.
 Ridges –Ridges result in a sequence of local maxima that is very difficult
forgreedy algorithms to navigate.
 Plateaux: a plateau is an area of state space landscape where the
evaluationfunction is flat. A hill-climbing search might be unable to find its way off
the plateau.
28. What is local beam search?
The local beam search algorithm keeps track of k states rather than just one. It
begins with k randomly generated states. At each step, all the successors of all k
states are generated. If anyone is a goal, the algorithm halts. Otherwise, it selects
the k best successors from the complete list and repeats.
29. What are the variants of hill-
climbing? Stochastic hill-
climbing
 Random selection among the uphill moves.
 The selection probability can vary with the steepness of the uphill move.
First-choice hill-climbing
cfr. Stochastic hill climbing by generating successors randomly until a better one is found.
Random-restart hill-climbing
Tries to avoid getting stuck in local maxima
5
30. Define constraint satisfaction problem. (Nov/Dec 2015)
 A Constraint Satisfaction problem (or CSP) is defined by a set of variables X1,
X2,…..,Xn, and a set of constraints, C1,C2,…..,Cm. Each variable Xi has a
nonempty domain Di of possible values.
 Each constraint Ci involves some subset of the variables and specifies the
allowable combinations of values for that subset. A state of the problem is
defined by an assignment of values to some or all of the variables,{X i =
vi,Xj=vj,…} A solution to a CSP is a complete assignment that satisfies all the
constraints.
31. What is a constraint graph?
It is helpful to visualize the Constraint Satisfaction Problem as a Constraint Graph.
A Constraint Graph is a graph where the nodes of the graph correspond to
variables of the problem and the arcs corresponds to constraints.
32. What are crypt arithmetic problem? Give an example.

Verbal arithmetic, also known as alphabetic, crypt arithmetic, cryptarithm or word
addition, is a type of mathematical game consisting of a mathematical equation
among unknown numbers, whose digits are represented by letters. The goal is to
identify the value of each letter.
 A crypt arithmetic problem. Each letter stands for a distinct digit; the aim is to
find a substitution of digits for letters such that the resulting sum is arithmetically
correct, with the added restriction that no leading zeroes are allowed.
 The constraint hyper graph for the crypt arithmetic problem, showing the All diff
constraint as well as the column addition constraints. Each constraint is a square
box connected to the variables it constrains.
33. Define a game.
Formal Definition of Game
We will consider games with two players, whom we will call MAX and MIN. MAX
moves first, and then they take turns moving until the game is over. At the end of the
6
game, points are awarded to the winning player and penalties are given to the loser.
A game can be formally defined as a search problem with the following components:
 The initial state includes the board position and identifies the player to move.
 A successor function returns a list of (move, state) pairs, each indicating a legal
move and the resulting state.
 A terminal test, describes when the game is over. States where the game has
ended are called terminal states.
 A utility function (also called an objective function or payoff function), which give a
numeric value for the terminal states. In chess, the outcome is a win, loss, or
draw,with values +1,-1,or 0. he payoffs in backgammon range from +192 to -192.
34. Explain briefly the min-max algorithm.
The minimax Algorithm

 The minimax algorithm computes the minimax decision from the current state. It uses
a simple recursive computation of the minimax values of each successor state,directly
implementing the defining equations.
35. What is Alpha-Beta pruning?
 Alpha Beta pruning gets its name from the following two parameters that
describe bounds on the backed-up values that appear anywhere along the path:
 α: the value of the best(i.e. Highest-value) choice we have found so far at any
choice point along the path of MAX.
 β: the value of best (i.e., lowest-value) choice we have found so far at any choice
point along the path of MIN.
36. What is heuristic search strategy? (Nov/Dec 2014)
Heuristic search is an Ai search technique employs heuristics for its move. Heuristic
is a rule of thumb probably leads to solution. It play a major role in search strategies
because of exponential nature of the most problems. It is also helps to reduce the
number of alternatives from an exponential number to a polynomial number.
37. What is constraint satisfaction problem? (Nov/Dec 2014)
Constraint satisfaction problems (CSPs) are mathematical problems defined as a set
of objects whose state must satisfy a number of constraints or limitations. CSPs
represent the entities in a problem as a homogeneous collection of finite constraints
over variables, which are solved by constraint satisfaction methods.
38. What are the functionalities of an agent function? (Nov/Dec 2012)
An agent is anything that can be viewed as perceiving (aware of) its environment
7
through sensors and SENSOR acting upon that environment through actuators
 A human agent has eyes, ears, and other organs for sensors and hands,
legs, mouth, and otherbody parts for actuators.
 A robotic agent might have cameras and infrared range finders for sensors
and various motors for actuators.
 A software agent receives keystrokes, file contents, and network packets as
sensory inputs and acts on the environment by displaying on the screen,
writing files, and sending network packets.
39. How can we avoid ridge and plateau in hill climbing? (Nov/Dec2012)
A ridge is a special kind of local maximum. It is an area of the search space that is
higher that the surrounding areas and that it have a slope. But the orientation of the
high region, compared to the set of available moves and the directions in which they
move, makes it impossible to traverse a ridge by single moves. Any point on a ridge
can look like peak because movement in all probe directions is downward.
A plateau is a flat area of the search space in which a whole set of neighboring
states have the same value. On a plateau, it i not possible to determine the best
direction in which to move by making local comparisons.
40. What are software agents? (Nov/Dec 2013)
A software agent is an piece of software that functions as an agent for a user or
another program, working autonomously and continuously in a particular
environment. It is inhibited by other processes and agents, but is also able to learn
from its experience in functioning in an environment over a long period of time.
41. Define the effect of heuristics accuracy on performance (Nov/Dec 2013)
 A heuristic h(n) is admissible if for every node n,
 h (n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n.
 An admissible heuristic never overestimates the cost to reach the goal, i.e., it is
optimistic Example: hSLD(n) (never overestimates the actual road distance)
42. Define Ideal Rational agents? (April/May 2015)
An ideal rational agent is the one, which is capable of doing expected actions to
maximize its performance measure, on the basis of −
 Its percept sequence
 Its built-in knowledge base
Rationality of an agent depends on the following four factors −
 The performance measures, which determine the degree of success.
 Agent‟s Percept Sequence till now.
 The agent‟s prior knowledge about the environment.
 The actions that the agent can carry out.
43. Why problem formulation must follow goal formulation (April/May 2015)
 In goal formulation, we decide which aspects of the world we are interested in,
and which can be ignored or abstracted away. Then in problem formulation we
decide how to manipulate the important aspects (and ignore the others).
 If we did problem formulation first we would not know what to include and what to
leave out. That said, it can happen that there is a cycle of iterations between goal
8
formulation, problem formulation, and problem solving until one arrives at a
sufficiently useful and efficient solution.
44. What is a rational agent? (Nov/Dec 2011)
A rational agent is one that does the right thing. Here right thing is one that will
cause agent to be more successful. That leaves us with the problem of deciding
how and when to evaluate the agent‟s success.
45. State the significance of using heuristic functions? (Nov/Dec 2011)
 The path cost from the current state to goal state is calculated, to select the
minimum path cost as the next state.
 Find the shortest solution using heuristic function that never over estimates the
number of steps to the goal.
46. What is ridge? (May/June 2016)
The orientation of the high region, compared to the set of available moves, makes it
impossible to climb up. However, two moves executed serially may increase the
height.
47. How much knowledge would be required by a perfect program for the problem
of playing chess? (MAY/JUNE 2016)
Computer chess is a game of computer architecture encompassing hardware and
software capable of playing chess autonomously without human guidance.
48. Compare Uninformed Search (Blind search) and informed Search (Heuristic
Search) strategies.(April/May 2017)
Uninformed or Blind Search Informed or Heuristic Search
No additional information beyond

that provided in the problem
definition
Not effective More effective
No information about number of Uses problem-specific knowledge
steps or path cost beyond the definition of the problem
itself.
49. Differentiate toy problems and real world problems.
TOY PROBLEMS REAL WORLD PROBLEMS
A toy problem is intended to illustrate various A real world problem is one whose
problem solving methods. It can be easily used solutions people actually care about.
by different researchers to compare the
performance of algorithms.
50. Will breadth first search always find the minimal solution if more than one
solution exist? Why? (April/May 2018)
Yes, Breadth First Search will always find the minimal solution if more than one
solution exist as BFS never explores longer paths until all shorter ones have already
been examined.
51. What is monotonic production system? (April/May 2018)
A production system in which the application of another rule that could also have
9
been applied at the time the first rule was selected
PART- B & C
1. Explain brute force‟s algorithm.
2. Explain AI application areas in detail.
3. Explain depth limited search algorithm in detail.
4. Explain Water jug problem with an example.
5. Explain various search strategies.
6. What is an agent? Explain the basic kinds of agents program (Nov/Dec 2014)
7. Explain the components necessary to define a problem (Nov/Dec 2014)
8. What is depth limited search? Give the recursive implementation of depth limited search
(Nov/Dec 2014)
9. Discuss recursive best first search algorithm (Nov/Dec 2014) (Nov/Dec 2015)
10. Explain in detail the structure of different intelligent agents. (Nov/Dec 2012)
11. Explain AO* algorithm with an example.(Nov/Dec 2012) (Nov/Dec 2015)
12. What are the five uninformed search strategies? Explain any two in detail with example.
(Nov/Dec 2013)
13. Explain the approach of formulation for constraint satisfaction
problems with example. (Nov/Dec 2013)
14. Explain any two informed search strategies. (April/May 2015)
15. Discuss about constraint satisfaction problem.(April/May 2015)
16. Explain the following uninformed search strategies. (April/May 2015 ) (Nov/Dec
2015)(April/May 2017)
a. Depth first search
b. Iterative Deepening Depth First Search
c. Bidirectional Search
17. Explain the Heuristic functions with examples. (May/June 2016)
18. Write the algorithm for Generate and Test and Simple Hill Climbing (May/June 2016)
19. Solve the given problem. Describe the operators involved in it.
Consider a water jug problem: you are given two jugs, a 4 gallon one and a 3-gallon
one. Neither has ant measuring markers on it.There is a pump that can be used to fill the
jugs with water. How can you get exactly 2 gallons of water into the 4-gallon jug? Explicit
Assumptions: A jug can be filled from the pump, water can be poured out of a jug onto
the ground, water can be poured from one jug to another and that there are no other
measuring devices available (May/June 2016)
20. Explain the types of Hill Climbing search technique.(Nov/Dec 2017)
21. Explain the process of annealing with example. (April/May 2017)
10
UNIT II
PART – A
1. What is game playing?
The term Game means a sort of conflict in which n individuals or groups (known as players)
participate. Game theory denotes games of strategy. Game theory allows decision-makers (players)
to cope with other decision-makers (players) who have different purposes in mind. In other words,
players determine their own strategies in terms of the strategies and goals of their opponent.
2. What is Mini –Max Strategy?

• generate the whole game tree , calculate the value of each terminal state
• based on the utility function - calculate the utilities of the higherlevel nodes
• starting from the leaf nodes up to the root - MAX selects the value with the highest node - MAX
assumes that MIN in its move will select the node that minimizes the value from MAX’s perspective
• MAX tries to move to a state with the maximum value, MIN to one with the minimum assumes that
both players play optimally selects the best successor from a given state , invokes
 MINIMAX-VALUE for each successor state
3. Define pruning? [ MAY/ JUNE 2016 ]

Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are
evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used
commonly for machine playing of two-player games (Tic-tac-toe, Chess, Go, etc.). It stops
completely evaluating a move when at least one possibility has been found that proves the move to
be worse than a previously examined move
4. How Knowledge is represented? [ MAY/ JUNE 2016 ]

A variety of ways of knowledge (facts) have been exploited in AI programs. Facts: truths in some
relevant world. These are things we want to represent.
5. What is propositional logic?

It is a way of representing knowledge. In logic and mathematics, a propositional calculus or logic is a
formal system in which formulae representing propositions can be formed by Combining atomic
propositions using logical connectives. Sentences considered in
propositional logic are not arbitrary sentences but are the ones that are either true or false, but not
both. This kind of sentences are called propositions.
Example Some facts in propositional logic:
It is raning. - RAINING It is sunny -SUNNY
It is windy - WINDY If it is raining ,then it is not sunny -RAINING -> SUNNY
6. Define First order Logic?

First-order logic (like natural language) assumes the world contains Objects: people, houses,
numbers, colors, baseball games, wars, … Relations: red, round, prime, brother of, bigger than,
part of, comes between, …
Functions: father of, best friend, one more than, plus, …
7. What are quantifiers?

There is need to express properties of entire collections of objects,instead of enumerating the
objects by name. Quantifiers let us do this.
FOL contains two standard quantifiers called
a) Universal () and Existential ()
11
8. Specify the syntax of First-order logic in BNF form
 Explain the connection between  and 

“Everyone likes icecream“ is equivalent”, “there is no one who does not like ice cream”
This can be expressed as : x Likes(x,IceCream) is equivalent to  Likes(x,IceCream)
10. What are the levels in Structuring of knowledge?

The knowledge level at which facts are described The symbol level at which representation of
objects at knowledge level are defined in terms of symbols.
11. What are the four properties for knowledge representation ?

. Representational adequacy
. Inferential adequacy
. Inferential efficiency
. Acquisitional efficiency
12. What is predicate calculus?

Predicate Calculus is a generalization of propositional calculus.Hence besides terms, predicates,
and quantifiers, predicate calculus contains propositional variables, constants and connectives
as part of the language.
13. What is frame problem? [MAY/JUNE 2016]

The whole problem of representing the facts, the change as well as those that do not is known as
frame problem.
14. What are semantic nets?

A semantic net are informations represented as a set of nodes connected to each other by a set of
labeled arcs, which represent relationship among the nodes.
15. Define Declarative and procedural knowledge. [ NOV/DEC 2018]

Declarative knowledge involves knowing THAT something is the case - that J is the tenth letter of
the alphabet, that Paris is the capital of France. Declarative knowledge is conscious; it can often be
verbalized. Metalinguistic knowledge, or knowledge about a linguistic form, is declarative
knowledge.
Procedural knowledge involves knowing HOW to do something - ride a bike, for example. We may
not be able to explain how we do it. Procedural knowledge involves implicit learning, which a
learner may not be aware of, and may involve being able to use a particular form to understand or
produce language without necessarily being able to explain it.
16. What are frames?
A frame is a collection of attributes and associated values that describe some entity in the world.
17. What is structured knowledge representation? [ APR/MAY 2018, NOV/DEC 2018]

Structure knowledge representations were explored as a general representation for symbolic
representation of declarative knowledge. One of the results was a theory for schema systems.
12
18. Difference between Logic programming and PROLOG.
In logic, variables are explicitly quantified. In PROLOG, quantification is provided implicitly
by the way the variables are interpreted
In logic, there are explicit symbols for and, or. In PROLOG, there is an explicit symbol
for and, but there is none for or
In logic, implications of the form “p implies q” are written as p. q . In PROLOG, the same
implication is written “backward” as q:-p.
19. What is property inheritance?

Property inheritance, in which, elements of specific classes inherit attributes and values from more
general classes in which they are included.
20. What is a Production System?

Knowledge representation formalism consists of collections of condition- action rules (Production
Rules or Operators), a database which is modified in accordance with the rules, and a Production
System Interpreter which controls the operation of the rules i.e The 'control mechanism' of a
Production System, determining the order in which Production Rules are fired. A system that uses
this form of knowledge representation is called a productionsystem. A production system consists
of rules and factors.
21. List out the advantages of production systems

 Production systems provide an excellent tool for structuring AI programs.
 Production Systems are highly modular because the individual rules can be added, removed or
modified independently. The production rules are expressed in a natural form, so the statements
contained in the knowledge base should the recording of an expert thinking out loud.
22. What is Frame based System? [MAY/JUNE 2016]

A frame is an artificial intelligence data structure used to divide knowledge into substructures by
representing "stereotyped situations." Frames are the primary data structure used in artificial
intelligence Frame languages. Frames are also an extensive part of knowledge representation and
reasoning schemes. Frames were originally derived from semantic networks and are therefore
part of structure based knowledge representations.
23. What type of information frame contains?

Facts or Data , Values (called facets) Procedures (also called procedural
attachments)
a. IF-NEEDED : deferred evaluation
b. IF-ADDED : updates linked information Default Values
c. For Data
For Procedures Other Frames or Sub frames
24. What is forward chaining? [APRIL/MAY 2017, APR/MAY 2018]

Using a deduction to reach a conclusion from a set of antecedents is called forward chaining. In
other words, the system starts from a set of facts,and a set of rules,and tries to find the way of
using these rules and facts to deduce a conclusion or come up with a suitable course of action.
This is known as data driven reasoning.
25. What is backward chaining? ? [APRIL/MAY 2017, APR/MAY 2018]

In backward chaining,we start from a conclusion,which is the hypothesis we wish to prove,and
we aim to show how that conclusion can be reached from the rules and facts in the data base.The
conclusion we are aiming to prove is called a
goal and the reasoning in this way is known as goal-driven.
13
26. Define Prior probability?
p(a) for the Unconditional or Prior Probability Is That the Proposition A is True. It is important to
remember that p(a) can only be used when there is no other information.
27. Give the Baye’s rule equation? [APRIL/MAY 2017, APR /MAY 2018]
W.K.T P(A^B) = P(A/B) P(B) -----------------------------------1
P(A^B) = P(B/A) P(A) ------------------------------------2
DIVIDINGBYP(A);WEGET
P(B/A) = P(A/B) P(B) P(A)
28. What is the basic task of a probabilistic inference?

The basic task is to reason in terms of prior probabilities of conjunctions, but for the most part, we
will use conditional probabilities as a vehicle for probabilistic inference.
29. Define certainty factor?

A certainty factor (cf), a number to measure the expert’s belief. The maximum value of the certainty
factor is, say, +1.0 (definitely true) and the minimum –1.0 (definitely false).For example, if the expert
states that some evidence is almost certainly true, a cf value of 0.8 would be assigned to this
evidence.
30. What is fuzzy logic?

The term fuzzy logic is used in two senses:
Narrow sense: Fuzzy logic is a branch of fuzzy set theory, which deals (as logical systems do) with
the representation and inference from knowledge. Fuzzy logic, unlike other logical systems, deals
with imprecise or uncertain knowledge. In this narrow and perhaps correct sense, fuzzy logic is just
one of the branches of fuzzy set theory.
Broad Sense: fuzzy logic synonymously with fuzzy set theory
31. Write the semantics of Bayesian network?

Semantics of Bayesian Networks
1. Representing the full joint distribution
2. Conditional independence relations in Bayesian networks
32. Define Dempster-Shafter Theory?

It considers sets of propositions and assigns to each of them an interval [Belief, Plausibility] in which
the degree of belief must lie. Belief (Bel) measures the strength of the evidence in favor of set of
propositions. It ranges from 0 (indicating no evidence) to 1 (denoting certainty)
33. What is a Bayesian network? [ MAY/JUNE 2016 ]

Bayesian network is an approach in which we preserves the formulations & rely instead on the
modulating of the world we are trying to model.
34. Write the properties of fuzzy sets. [MAY/JUNE 2016]
14
35. Define Fuzzy reasoning. [NOV/DEC 2017].
Human Reasoning means the action of thinking about something in a logical/sensible way. Fuzzy
Logic (FL) is a method of reasoning that resembles human reasoning. The approach of FL
imitates the way of decision making in humans that involves all intermediate possibilities between
digital values YES and NO.
36. Compare production based system with frame based system. [NOV/DEC 2017]
Production based system Frame based system

A production system(or production rule system) Frame-based systems are knowledge
is a computer program typically used to provide representation systems that use frames,
some form of artificial intelligence, which means to represent domain knowledge. A
consists primarily of a set of rules about frame is a structure for representing a
behavior. CONCEPT or situation such as "living room"
or "being in a living room."
PART – B & C
1. List the Issues in knowledge representation Refer Page 86 in Kevin Night

2.State Representation of facts in predicate logic with an example. Refer Page 99 in Kevin Night
3. How will you represent facts in propositional logic with an example?[NOV/DEC 2018, APR/MAY
2018] Refer Page 113 in Kevin Night
4. Explain Resolution in brief with an example. [ MAY/ JUNE 2016 ] Refer Page 108 in Kevin Night
5. Write algorithm for propositional resolution and Unification algorithm. [ MAY/JUNE 2016] Refer
Page 113 in Kevin Night
6. Convert the following well formed formula into clause from with sequence of steps: [ MAY/JUNE
2016 ]
7. Explain the Minimax algorithm in detail. [APRIL/MAY 2017,APR/MAY 2018] Refer Page 165 in
Stuart Russell
8. Explain Alpha-Beta Pruning [APRIL/MAY 2017] Refer Page 167 in Stuart Russell
9. Consider the following sentences: [NOV/DEC 2017, NOV/DEC 2018]
John likes all kinds of food * Applies are food* Chicken is food * Anything anyone eats and
isn’t killed by is food
Bill eats peanuts and is still alive
Sue eats everything Bill eats
(i). Translate these sentences into formulas in predicate logic
(ii). Convert the formulas of part a into clause form
10.Trace the operation of the unification algorithm on each of the following pairs of literals:
f(Marcus) and f(Caesar) ii. f(x) and f(g(y))
f(Marcus,g(x,y)) and f(x,g(Caesar,Marcus)) Refer Page 100 in Kevin Night
11. Explain Alpha-Beta algorithm [APRIL/MAY 2017] Refer Page 167 in Stuart Russell
12. Write algorithm for Unification algorithm. [ MAY/JUNE 2016] Refer Page 113 in Kevin Night
13. State Representation of facts in propositional logic with an example.Refer Page 99 in Kevin Night
14. Perform Resolution for “India Wins the match” example. [ MAY/ JUNE 2016 ] Refer Page 108 in
Kevin Night
15. Consider a two player game in which the minimax search procedure is used to compute the best
moves for the first player. Assume a static evaluation function that returns values ranging from -10 to
15
10, with 10 indicating a win for the first player and -10 a win for the second player. Assume the
following game tree in which the static scores are from the first player’s point of view. Suppose the
first player is the maximizing player and needs to take the next move. What move should be chosen
at this point? Can the search be optimized? [APR/ MAY 2018]
16.Explain the production based knowledge representation techniques? [NOV/DEC 2017] Refer
Page 30 in Kevin Night
17. Explain the frame based knowledge representation? [APR/MAY 2018]Refer Page 193 in Kevin
Night
18. Write short notes on Backward Chaining and explain with example. [ MAY/ JUNE 2016 ,
APRIL/MAY 2017, NOV/DEC 2018] Refer Page 137 in Kevin Night
19. Discuss briefly about Bayesian probability Refer 179 in Kevin Night
20. Write short notes on Rule value approach Refer 174 in Kevin Night
21. Briefly discuss about reasoning done using fuzzy logic. [ MAY/JUNE 2016 ]
Refer Page 184 in Kevin Night
22. Discuss the Dempster-Shafer Theory [ MAY/ JUNE 2016 ], [APRIL/MAY 2017, NOV/DEC 2017,
APR/MAY 2018]
Refer Page 181 in Kevin Night
23. Discuss about Bayesian Theory and Bayesian Network [NOV/DEC 2017, APR/MAY 2018,
NOV/DEC 2018] Refer Page 179 in Kevin Night
24. Write short notes on Forward chaining and explain with example. [ MAY/ JUNE 2016 ,
APRIL/MAY 2017, NOV/DEC 2018] Refer Page 137 in Kevin Night
25. Discuss briefly about Bayesian Networks Refer 179 in Kevin Night
26. Write short notes on Certainty factor Refer Page 174 in Kevin Night
27. Suppose the police is informed that one of the four terrorist organizations A,B, C or D has
planted a bomb in a building. Draw the lattice of subsets of the universe of discourse, U. Assume
that one evidence supports that groups A and C were responsible to a degree of m1({A,C})=0.6 and
another evidnce supports the belief that groups A,B and D were involved to a degree
m2({A,B,D})=0.7. Compute and create the tableau of combined values of belief for m1 and m2.
[APR/MAY 2018]
28. Construct a Bayesian Network and define the necessary CPTs for the given scenario. We have a
bag of three biased coins a,b and c with probabilities of coming up heads of 20%, 60% and 80%
respectively. One coin is drawn randomly from the bag (with equal likelihood of drawing each of the
three coins) and then the coinis flipped three times to generate the outcomes X1, X2 and X3.
[NOV/DEC 2018]
29. Explain fuzzy logic. [ MAY/JUNE 2016 ] Refer 184 in Kevin Night
30. Explain the frames [APR/MAY 2018] Refer Page 193 in Kevin Night
16
UNIT – IV
PART – A
1. What is learning?
Learning covers a wide range of phenomena.At one end of the spectrum is skill refinement.People
get better at many tasks simply by practicing.At the other end of the spectrum lies knowledge
acquisition.Knowledge is
generally acquired through experience.
2. What are types of learning?

 ROTE learning
 Learning by taking advice
 Learning in problem solving
 Learning from examples
 Explanation based learning
3. Define Machine learning. [ APR/MAY 2018]

Machine Learning, a branch of artificial intelligence, is about the construction and study of systems
that can learn from data.The core of machine learning deals with representation and
generalization. Representation of data instances and functions evaluated on these instances are
part of all machine learning systems. Generalization is the property that the system will perform
well on unseen data instances; the conditions under which this can be guaranteed are a key
object of study in the subfield of computational learning theory
4. What is Adaptive learning? [NOV/DEC 2017]

Adaptive learning has been partially driven by a realization that tailored learning cannot be
achieved on a large-scale using traditional, non- adaptive approaches. Adaptive learning systems
endeavor to transform the learner from passive receptor of information to collaborator in the
educational process. Adaptive learning systems' primary application is in education, but another
popular application is business training. They have been designed as both desktop computer
applications and web applications
5. What is planning?
Planning refers to the process of computing several steps of a problem solving procedure before
executing any of them.
6. What are Strips? [NOV/DEC 2018]

Strips or Stanford Reseach Institute Problem Solver is an automated planner.An strips instance
consists of
1. An initial state;
2.The specification of the goal states – situations which the planner is trying to reach;
3. A set of actions. For each action, the following are included:
 preconditions (what must be established before the action isperformed);
 postconditions (what is established after the action isperformed).
7. What is non linear planning?

It is not composed of a linear sequence of complete subplans. These are interwined plans which
most problems require in which multiple sub problems are worked on simultaneously.
8. What are the Fundamental concepts of machine learning?
1. Induction, and 2. Generalisation
17
9. List out successful applications of machine learning?
 Adaptable software system  Bioinformatics
 Natural language processing  Speech recognition
Pattern recognition  Intelligent control
 Trend prediction
10. What is the Need for Learning?

The general learning approach is to generate potential improvements, test them, and only use
those that work well. Naturally, there are many ways we might generate the potential
improvements, and many ways we can test their usefulness. At one extreme, there are model
driven (top-down) generators of potential improvements, guided by an understanding of how the
problem domain works. At the other, there are data driven (bottom- up) generators, guided by
patterns in some set of training data.
11. What is Supervised learning ? [ NOV/DEC 2018]

The computer is presented with example inputs and their desired outputs,given by a "teacher",
and the goal is to learn a general rule that maps inputs to outputs.
12. What is Unsupervised learning? [NOV/DEC 2018]

No labels are given to the learning algorithm, leaving it on its own to find structure in its input.
Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means
towards an end (feature learning).
13. What is Reinforcement learning?

A computer program interacts with a dynamic environment in which it must perform a certain goal
(such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its
goal. Another example is learning to play a game by playing against an opponent.
14. How Matthew’s Correlation Coefficient is computed?
15. Define the terms Sensitivity, Specificity, Precision and Recall.
16. Define Accuracy.

Accuracy is then defined as the sum of the number of true positives and true negatives divided
by the total number of examples (where # means ‘number of’, and TP stands for True Positive,
etc.):
18
PART – B & C
1) Explain in detail about Machine learning? [APRIL/MAY 2017, NOV/DEC 2017,

APR/MAY 2018, NOV/DEC 2018]. Refer in Stuart Russelll
2) Explain Regression in detail. Refer in Stuart Russelll
3) Explain the various techniques used for testing machine learning algorithms?
Refer in Stuart Russelll
4) Explain the different performance metrics used in machine learning algorithm?
Refer in Stuart Russelll
5) Explain about supervised learning. Refer in Stuart Russelll
6) Explain Classification in detail. Refer in Stuart Russelll
7) Explain about testing machine learning. Refer in Stuart Russelll
19
UNIT – IV
PART – A
1. What is neuron?
The processing units of the brain are nerve cells called neurons. There are lots of them
(100 billion = 1011 is the figure that is often given) and they come in lots of different
types, depending upon their particular task. Each neuron can be viewed as a separate
processor, performing a very simple computation: deciding whether or not to fire. This
makes the brain a massively parallel computer made up of 1011 processing elements.
2) State Hebb’s rule?

Hebb’s rule says that the changes in the strength of synaptic connections are
proportional to the correlation in the firing of the two connecting neurons. So if two
neurons consistently fire simultaneously, then any connection between them will change
in strength, becoming stronger.
3) What is McCulloch and Pitts neuron?

The McCulloch and Pitts neuron is a binary threshold device. It sums up the inputs
(multiplied by the synaptic strengths or weights) and either fires (produces output 1) or
does not fire (produces output 0) depending on whether the input is above some
threshold. We can write the second half of the work of the neuron, the decision about
whether or not to fire (which is known as an activation function), as:
4) Define neural network?

Neural networks, also known as artificial neural networks (ANNs) or simulated neural
networks (SNNs), are a subset of machine learning and are at the heart of deep
learning algorithms. Their name and structure are inspired by the human brain,
mimicking the way that biological neurons signal to one another.
5) Define Perceptron in machine learning.

Perceptron is Machine Learning algorithm for supervised learning of various binary
classification tasks. Further, Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input data computations in business
intelligence.
6) Define Linear Regression.

Linear regression is one of the easiest and most popular Machine Learning algorithms.
It is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.
7) Define Linear separability.

Linear separability implies that if there are two classes then there will be a point, line,
plane, or hyperplane that splits the input features in such a way that all points of one
class are in one-half space and the second class is in the other half-space.
8) List the advantages of K-Means Algorithm.

Relatively simple to implement.
Scales to large data sets.
Guarantees convergence.
Can warm-start the positions of centroids.
Easily adapts to new examples.
Generalizes to clusters of different shapes and sizes, such as elliptical clusters.
20
Choosing manually.
Being dependent on initial values.
9. What are Neural Networks? What are the types of Neural networks?
In simple words, a neural network is a connection of many very tiny processing elements
called as neurons. There are two types of neural network- Biological Neural Networks–
These are made of real neurons.Those tiny CPU’s which you have got inside your brain..if u
have..Not only brain,,but neurons actually make the whole nervous system. Artificial Neural
Networks– Artificial Neural Networks is an imitation of Biological Neural Networks,,by artificial
designing small processing elements, in lieu of using digital computing systems that have
only the binary digits. The Artificial Neural Networks are basically designed to make robots
give the human quality efficiency to the work.
10. Why use Artificial Neural Networks? What are its advantages?
Mainly, Artificial Neural Networks OR Artificial Intelligence is designed to give robots human
quality thinking. So that machines can decide “What if” and ”What if not” with precision. Some of
the other advantages are:-
 Adaptive learning: Ability to learn how to do tasks based on the data given for training or
initial experience.
 Self-Organization: An Artificial Neural Networks can create its own organization or
representation of the information it receives during learning time.
 Real Time Operation: Artificial Neural Networks computations may be carried out in parallel,
and special hardware devices are being designed and manufactured which take advantage of
this capability.
11. How human brain works?

It is weird at the same time amazing to know that we really do not know how we think.
Biologically, neurons in human brain receive signals from host of fine structures called
as dendrites. The neuron sends out spikes of electrical activity through a long, thin stand
known as an axon, which splits into thousands of branches. At the end of each branch, a
structure called a synapse converts the activity from the axon into electrical effects that inhibit
or excite activity from the axon into electrical effects that inhibit or excite activity in the
connected neurons. When a neuron receives excitation input that is sufficiently large
compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning
occurs by changing the effectiveness of the synapses so that the influence of one neuron on
another changes.
12. How Artificial Neurons learns?

This is a two paradigm process-
 Associative Mapping: Here the network produces a pattern output by working in a
pattern on the given input.
 Regularity Detection: In this, units learn to respond to particular properties of the input
patterns. Whereas in associative mapping the network stores the relationships among
patterns, in regularity detection the response of each unit has a particular ‘meaning’. This type
of learning mechanism is essential for feature discovery and knowledge representation.
21
13. What are the disadvantages of Artificial Neural Networks?
Answer: The major disadvantage is that they require large diversity of training for
working in a real environment. Moreover, they are not strong enough to work in the
real world.
14. What do you mean by Perceptron?

A perceptron also called an artificial neuron is a neural network unit that does certain
computations to detect features. It is a single-layer neural network used as a linear classifier while
working with a set of input data. Since perceptron uses classified data points which are already
labeled, it is a supervised learning algorithm. This algorithm is used to enable neurons to learn
and process elements in the training set one at a time.
15. What do you mean by Cost Function?

While building deep learning models, our whole objective is to minimize the cost function. A cost
function explains how well the neural network is performing for its given training data and the
expected output.It may depend on the neural network parameters such as weights and biases. As
a whole, it provides the performance of a neural network.
16. What is the difference between Forward propagation and Backward Propagation in
Neural Networks?
Forward propagation: The input is fed into the network. In each layer, there is a specific
activation function and between layers, there are weights that represent the connection strength
of the neurons. The input runs through the individual layers of the network, which ultimately
generates an output.
Backward propagation: an error function measures how accurate the output of the network is.
To improve the output, the weights have to be optimized. The backpropagation algorithm is used
to determine how the individual weights have to be adjusted. The weights are adjusted during the
gradient descent method.
17. Overfitting is one of the most common problems every Machine Learning practitioner
faces. Explain some methods to avoid overfitting in Neural Networks.
Dropout: It is a regularization technique that prevents the neural network from overfitting. It
randomly drops neurons from the neural network during training which is equivalent to training
different neural networks. The different networks will overfit differently, so the net effect of the
dropout regularization technique will be to reduce overfitting so that our model will be good for
predictive analysis.
PART B & C
1) State and Expalin Hebb’s rule? Refer in Stuart Russelll

2) Explain Perceptron in detail. Refer in Stuart Russelll
3) Explain Linear Regression in detail. Refer in Stuart Russelll
4) Explain K-means algorithm in detail. Refer in Stuart Russelll
5) Explain SOM Algorithm. Refer in Stuart Russelll
6) Explain Vector Quantization. Refer in Stuart Russelll
22
UNIT - V
PART – A
1. What is Expert system?

Expert systems are computer programs that are derived from a branch of computer science
research called AI. The programs that achieve expert level competence in solving problems in
task areas by bringing to bear a body of knowledge about specific tasks are called expert systems
or knowledge base.
2. What are the most important aspects of expert systems?

The knowledge base and The reasoning or inference engine
3. What are the characteristics of expert systems? [NOV/DEC 2017]

1. Expert systems use the knowledge rather than data to control the solution process.
2. The knowledge is encoded and maintained as an entity separate from the control
program.
3. They explain how a particular conclusion was reached.
4. They use symbolic representations for knowledge and perform their inference through
symbolic computation.
5. They often reason with Meta knowledge.
4. Explain the role of domain expert?
The role of the domain expert is to discover and cumulate the knowledge of the task domain. The
domain knowledge consists of both formal, textbook knowledge and experimental knowledge.
5. What is the use of expert systems building tools?

The use of expert system building tools is to build an expert system using a piece of development
software known as a tool or shell.
6. Define the knowledge acquisition process.

Knowledge acquisition is the programs that interact with the domain experts to extract expert
knowledge efficiently.These programs provides support for the following activities
– Entering knowledge.
– Maintain knowledge base consistency.
Ensuring knowledge base completeness.
7. Name the programming languages used for expert systems application.

PROLOG, LISP
8. What are the stages in the development of expert system tools?

Knowledge base. Inference process. Explaining how
and why.
Building a knowledge base. The I/O interface.
9. What is metaknowledge? [ MAY / JUNE 2016 , APRIL/MAY 2017, NOV/DEC 2018]

The term meta-knowledge is possible to interpret as knowledge about knowledge. These
search control knowledge can be represented declaratively using rules.
10. Define Heuristic.

In human computer-interaction, heuristic evaluation is a usability testing technique devised by
expert usability consultants. In heuristic evaluation, the user interface is reviewed by experts and
its compliance to usability heuristics (broadly stated characteristics of a good user interface,
based on prior experience) is assessed, and any violating aspects are recorded.
23
11. What are the players in expert system?
Players in expert system are: Expert, Knowledge Engineer, User
12. What are the advantages of Expert system? [ MAY / JUNE 2016 ]
– Availability: Expert systems are available easily due to mass production software.
– Cheaper: The cost of providing expertise is not expensive.
– Reduced danger: They can be used in any risky environments where humans cannot
work with.
– Permanence: The knowledge will last long indefinitely.
– Multiple expertises: It can be designed to have knowledge of many experts.
13. List out the limitations of expert system?
 Not widely used or tested
 Limited to relatively narrow problems
 Cannot readily deal with “mixed” knowledge
 Possibility of error
 Cannot refine own knowledge base
 Difficult to maintain
 May have high development costs
 Raise legal and ethical concerns
14. What are applications of Expert Systems? [ MAY/JUNE 2016 ]

– Credit granting
– Information management and retrieval
– AI and expert systems embedded in products
– Plant layout
– Hospitals and medical facilities
– Help desks and assistance
– Employee performance evaluation
– Loan analysis
15. What is expert system shell? [APR/MAY 2018]
The Expert System Shell is essentially a special purpose toolthat is built in line with the
requirements and standards ofparticular domain or expert-knowledge area applications. Itmay be
defined as a software package that facilitates thebuilding of knowledge-based expert systems by
providing aknowledge representation scheme and an inference engineThe Shell refers to the
software module containing aninterface, an inference engine, and a structured skeleton of
aknowledge base (in its empty state) with the appropriateknowledge representation facilities.
16. Sketch the Components of an Expert System Shell. [ MAY/JUNE 2016 ]
17. What is XCON? [ MAY/JUNE 2016 ]

The R1 (internally called XCON, for eXpertCONfigurer) program is a production-rule-based system
written in OPS5 by John P. McDermott of CMU in 1978 to assist in the ordering of DEC's VAX
computer systems by automatically selecting the computer system components based on the
customer's requirements. The development of XCON followed two previous unsuccessful efforts
to write an expert system for this task, in FORTRAN and BASIC.
24
18. Define DART? [ MAY/JUNE 2016 ]
The Dynamic Analysis and Replanning Tool, commonly abbreviated to DART, is an artificial
intelligence program used by the U.S. military to optimize and schedule the transportation of
supplies or personnel and solve other logistical problems. DART uses intelligent agents to aid
decision support systems located at the U.S. Transportation and European Commands
19. What is MYCIN? [ MAY/JUNE 2016 ]

MYCIN was an early expert system that used artificial intelligence to identify bacteria causing
severe infections, such as bacteremia and meningitis, and to recommend antibiotics, with the
dosage adjusted for patient's body weight — the name derived from the antibiotics themselves, as
many antibiotics have the suffix "-mycin". The Mycin system was also used for the diagnosis of
blood clotting diseases.
Mention the benefits of Meta knowledge?
Reuse and knowledge sharing Reliability
20. Mention the guidelines to be considered while planning for knowledge Acquisition
a. Domain selection
b. Selection of knowledge engineer
c. Selection of expert
d. The initial meeting
e. Organization of follow-on meetings
f. Conducting follow on meetings
21. List out the issues in knowledge Acquisition. [APR/MAY 2018]
■ knowledge is in the head of experts
■ Experts have vast amounts of knowledge
■ Experts have a lot of tacit knowledge
■ Experts are very busy and valuable people
■ One expert does not know everything
22 What is the role of inference engine?

1. Combines the facts of a specific case with the knowledge contained in the knowledge base to
come up with a recommendation.
In a rule-based expert system, the inference engine controls the order in which production rules
are applied and resolves conflicts if more than one rule is applicable at a given time
2. Directs the user interface to query the user for any information it needs for further inferencing.
23. What is rule based knowledge representation.

The rule based system uses knowledge encoded in the form of production rules, that is if then
rules. The rules have an antecedent or condition part, the left hand side, and a conclusion or
action part, the right hand side. Each rule represents a small chunk of knowledge relating to the
domain of expertise.
24.Differentiate Human Expert and Expert system?
Human Experts Expert Systems Conventional Programs

Use knowledge in the Process knowledge Process data and use
form of rules of thumb or expressed in the form of algorithms, a series of
heuristics to solve rules and use symbolic well-defined operations,
problems in a narrow reasoning to solve to solve general numerical
domain. problems in a narrow problems.
domain.
In a human brain, Provide a clear Do not separate
knowledge exists in a separation of knowledge knowledge from the
compiled form. from its processing. control structure to
process this knowledge.
25
Capable of explaining a Trace the rules fired Do not explain how a
line of reasoning and during a problem-solving particular result was
providing the details. session and explain how a obtained and why input
particular conclusion was data was needed.
reached and why specific
data was needed.
25. Name some early expert systems? [ MAY / JUNE 2016]

 DENDRAL – used in chemical mass spectroscopy to identify chemical constituents
 MYCIN – medical diagnosis of illness
 DIPMETER – geological data analysis for oil
 PROSPECTOR – geological data analysis for minerals
 XCON/R1 – configuring computer systems
26. What are the advantages of the MYCIN. [APRIL/MAY 2017]

- It reduces the time taken to solve the problem
 It includes the knowledge of many experts , its more accurate than a single expert
 It improves customer/patient services and the standing of the expert
 Can predict future problems and solve current ones
 It saves the company money due to faster service time
27. What are the properties of Expert system? [ MAY / JUNE 2016 ]
– Availability: Expert systems are available easily due to mass production software.
– Cheaper: The cost of providing expertise is not expensive.
– Reduced danger: They can be used in any risky environments where humans cannot
work with.
– Permanence: The knowledge will last long indefinitely.
– Multiple expertises: It can be designed to have knowledge of many experts.
– Explanation: They are capable of explaining in detail the reasoning that led to a
conclusion
– Fast response: They can respond at great speed due to the
inherent advantages of computers over humans.
– Unemotional and response at all times: Unlike humans, they do not get tense,
fatigue or panic and work steadily during emergency situations.
28. What are the disadvantages of the MYCIN. [APRIL/MAY 2017]

Expert systems cost alot to set up
- The user (mechanics /patients/doctors) will need training in how to use it, which takes time
and money
– It will need continuous updating... which can take it temporarily out of use
– In a company or doctors practice, there will need to be one in every garage/branch/
surgery
PART – B & C
1. With neat sketch explain the architecture, characteristic features and roles of expert system.
[ MAY / JUNE 2016 , APR/MAY 2018] Refer Page 422 in Kevin Knight
2. Discuss about the Knowledge Acquisition process in expert systems [ MAY / JUNE 2016 ]
Refer Page 427 in Kevin Knight
3. Write notes on Meta Knowledge and Heuristics in Knowledge Acquisition Refer Page 427 in
Kevin Knight
26
4. Explain in detail about the expert system shell.[ NOV/DEC 2018 ] Refer Page 424 in Kevin Knight
5. Write notes on expert systems MYCIN, DART and XCON and how it works? Explain. [NOV/DEC
2017, APR/MAY 2018] Refer Page 422 in Kevin Knight
6. Explain the basic components and applications of expert system. [ MAY / JUNE 2016 ]
7. Define Expert system. Explain the architecture of an expert system in detail with a neat diagram
and an example. [APRIL/MAY 2017] Refer Page 422 in Kevin Knight
8. Write the applications of expert systems. [MAY / JUNE 2016 ] Refer Page 425 in Kevin Knight
9. Explain the need, significance and evolution of XCON expert system. [APRIL/MAY 2017]
10. Explain the expert system architectures: [NOV/DEC 2017]
1. Rule-based system architecture 2. Associative or semantic Network Architecture 3.
Network architecture 4 Blackboard system Architectures Refer Page 422 in Kevin Knight
11. Design an expert system for Travel recommendation and discuss its roles. : [NOV/DEC 2017]
12. Explain the architecture of an expert system in detail with a neat diagram and an example.
[APRIL/MAY 2017] Refer Page 422 in Kevin Knight
13. Explain the XCON expert system. [APRIL/MAY 2017] Refer Page 425 in Kevin Knight
14. Explain the applications of expert system. [ MAY / JUNE 2016 ] Refer Page 424 in Kevin Knight
15. Explain the architecture of expert system. [ MAY / JUNE 2016 , APR/MAY 2018] Refer Page
422 in Kevin Knight
27

Aiml FPP

Uploaded by

Copyright:

Available Formats

Aiml FPP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aiml FPP

Uploaded by

Copyright:

Available Formats

RAJALAKSHMI ENGINEERING COLLEGE

Standard of the material

Remarks & Recommendations:

Name of Reviewer Signature of Reviewer /Date

Mentor HOD Course Teacher

 To comprehend and analyze user requirements to design IT based solutions

IT19643 ARTIFICIAL INTELLIGENCE AND MACHINE PC 3 0 2 4

Objectives: Broad objective of this course is

UNIT-I INTRODUCTION TO Al AND PRODUCTION SYSTEMS 9

Total Contact Hours : 75

On completion of course students will be able to

CO1 earn the methods of solving problems using Artificial Intelligence.

CO/PO PO PO PO PO PO PO PO PO PO PO1 PO1 PO1 PSO PSO PSO PSO

Correlation levels 1, 2 or 3 are as defined below:

FACULTY NAME FACULTY CODE

ARTIFICIAL INTELLIGENCE AND

 To learn the methods of solving problems using Artificial Intelligence.

UNIT – 1 INTRODUCTION TO Al AND PRODUCTION SYSTEMS

UNIT – I INTRODUCTION TO AI AND PRODUCTION SYSTEMS

 It is a branch of Computer Science that pursues creating the computers or machines as

 Artificial Intelligence is a way of making a computer, a computer-controlled robot, or

 From a programming perspective, AI includes the study of symbolic programming,

 Intelligent behaviour is depicted by perceiving one’s environment, acting in complex

 Science based goals of AI pertain to developing concepts, mechanisms and understanding

 Engineering based goals of AI relate to developing concepts, theory and practice of

The following questions are to be considered before we can step forward:

1. What are the underlying assumptions about intelligence?

A physical symbol system

An AI technique is a method that exploits knowledge that is represented so that:

Program 1: The first approach (simple)

The algorithm makes moves by pursuing the following:

1. View the vector as a ternary number. Convert it to a decimal number.

Program 2: The second approach

The algorithm consists of three actions:

Program 3: The final approach

Example sentence: ‘She found a red one she really liked’

Fig: A shopping Script

Problems, Problem Spaces and Search

Problem solving is a process of generating solutions from observed data.

DEFINING PROBLEM AS A STATE SPACE SEARCH

State Space Search

Example: Eight puzzle (8-Puzzle)

 The first requirement of a good control strategy is that it causes motion

Example : BFS and DFS

Fig: Breadth First Search Tree

Fig: Depth – First Search Tree

Example: Traveling Salesman Problem:

Can Solution Steps be Ignored or Undone?

Is the Problem Universe Predictable?

Is Good Solution Absolute or Relative?

Is the solution a State or a Path?

Consider the problem of chess playing,

Does the Task Require Interaction with a person?

PRODUCTION SYSTEM CHARACTERISTICS

Fig: Architecture of Production System

SPECIALIZED PRODUCTION SYSTEMS:

Now first consider the first current state of step 4 i.e.

Now first consider the first current state of step 4 i.e.

PROBLEM SOLVING METHODS