AINOTES MODULE1 Merged Compressed
AINOTES MODULE1 Merged Compressed
AINOTES MODULE1 Merged Compressed
Intelligence: It is the knowledge in operation towards the solution – how to do? How to apply
the solution?
Artificial Intelligence: Artificial intelligence is the study of how make computers to do things
which people do better at the moment. It refers to the intelligence controlled by a computer
machine.
One View of AI is
About designing systems that are as intelligent as humans
Computers can be acquired with abilities nearly equal to human intelligence
How system arrives at a conclusion or reasoning behind selection of actions
How system acts and performs not so much on reasoning process.
The AI Problem
There are some of the problems contained within AI.
1. Game Playing and theorem proving share the property that people who do them well
are considered to be displaying intelligence.
2. Another important foray into AI is focused on Commonsense Reasoning. It includes
reasoning about physical objects and their relationships to each other, as well as reasoning
about actions and other consequences.
3. To investigate this sort of reasoning Nowell Shaw and Simon built the General Problem
Solver (GPS) which they applied to several common sense tasks as well as the problem of
performing symbolic manipulations of logical expressions. But no attempt was made to
create a program with a large amount of knowledge about a particular problem domain.
Only quite simple tasks were selected.
4. The following are the figures showing some of the tasks that are the targets of work in AI:
Perception of the world around us is crucial to our survival. Animals with much less intelligence
than people are capable of more sophisticated visual perception. Perception tasks are difficult
because they involve analog signals. A person who knows how to perform tasks from several of
the categories shown in figure learns the necessary skills in standard order.
First perceptual, linguistic and commonsense skills are learned. Later expert skills such as
engineering, medicine or finance are acquired.
What is an AI Technique?
Artificial Intelligence problems span a very broad spectrum. They appear to have very little in
common except that they are hard. There are techniques that are appropriate for the solution of
a variety of these problems. The results of AI research tells that
Important AI Techniques:
Search: Provides a way of solving problems for which no more direct approach is
available as well as a framework into which any direct techniques that are
available can be embedded.
Use of Knowledge: Provides a way of solving complex problems by exploiting the
structures of the objects that are involved.
Abstraction: Provides a way of separating important features and variations from
the many unimportant ones that would otherwise overwhelm any process.
typing questions and receiving typed responses. The interrogator knows them only as Z and X
and aims to determine who the person is and who the machine is.
The goal of machine is to fool the interrogator into believing that it is the person. If the machine
succeeds we conclude that the machine can think. The machine is allowed to do whatever it can
do to fool the interrogator.
For example, if asked the question “How much is 12,324 times 73,981?” The machine
could wait several minutes and then respond with wrong answer.
The interrogator receives two sets of responses, but does not know which set comes from human
and which from computer. After careful examination of responses, if interrogator cannot
definitely tell which set has come from the computer and which from human, then the computer
has passed the Turing Test. The more serious issue is the amount of knowledge that a machine
would need to pass the Turing test.
We will see the introduction of the systems which equal or exceed human abilities and see them
because an important part of most business and government operations as well as our daily
activities.
Definition of AI: Artificial Intelligence is a branch of computer science concerned with the study
and creation of computer systems that exhibit some form of intelligence such as systems that learn
new concepts and tasks, systems that can understand a natural language or perceive and
comprehend a visual scene, or systems that perform other types of feats that require human types
of intelligence.
AI is not the study and creation of conventional computer systems. The study of the mind, the
body, and the languages as customarily found in the fields of psychology, physiology, cognitive
science, or linguistics.
In AI, the goal is to develop working computer systems that are truly capable of performing tasks
that require high levels of intelligence.
State space is a set of legal positions, starting at the initial state, using the set of rules
to move from one state to another and attempting to end up in a goal state.
We must make explicit the preciously implicit goal of not only playing a legal game of
chess but also winning the game, if possible.
Production System
The entire procedure for getting a solution for AI problem can be viewed as “Production
System”. It provides the desired goal. It is a basic building block which describes the AI problem
and also describes the method of searching the goal. Its main components are:
A Set of Rules, each consisting of a left side (a pattern) that determines the applicability of
the rule and right side that describes the operation to be performed if the rule is applied.
Knowledge Base – It contains whatever information is appropriate for a particular task.
Some parts of the database may be permanent, while the parts of it may pertain only to
the solution of the current problem.
Control Strategy – It specifies the order in which the rules will be compared to the
database and the way of resolving the conflicts that arise when several rules match at one.
o The first requirement of a goal control strategy is that it is cause motion; a control
strategy that does not cause motion will never lead to a solution.
o The second requirement of a good control strategy is that it should be systematic.
A rule applier: Production rule is like below
if(condition) then
consequence or action
A partially commutative production system is a production system with the property that
if the application of a particular sequence of rules transforms state X into state Y, then any
permutation of those rules that is allowable also transforms state X into state Y.
In a formal sense, there is no relationship between kinds of problems and kinds of production of
systems, since all problems can be solved by all kinds of systems. But in practical sense, there
definitely is such a relationship between kinds of problems and the kinds of systems that led
themselves naturally to describing those problems.
The following figure shows the four categories of production systems produced by the two
dichotomies, monotonic versus non-monotonic and partially commutative versus non-partially
commutative along with some problems that can be naturally be solved by each type of system.
Monotonic Non-monotonic
Partially commutative, monotonic production systems are useful for solving ignorable
problems that involves creating new things rather than changing old ones generally
ignorable. Theorem proving is one example of such a creative process partially
commutative, monotonic production system are important for a implementation stand
point because they can be implemented without the ability to backtrack to previous states
when it is discovered that an incorrect path has been followed.
This is usually the case in physical manipulation problems such as “Robot navigation on a
flat plane”. The 8-puzzle and blocks world problem can be considered partially
commutative production systems are significant from an implementation point of view
because they tend to read too much duplication of individual states during the search
process.
Production systems that are not partially commutative are useful for many problems in
which changes occur. For example “Chemical Synthesis”
Non-partially commutative production system less likely to produce the same node many
times in the search process.
Problem Characteristics
In order to choose the most appropriate method (or a combination of methods) for a particular
problem, it is necessary to analyze the problem along several key dimensions:
• Is the problem decomposable?
• Can solution steps be ignored or undone?
• Is the universe predictable?
• Is a good solution absolute or relative?
• Is the solution a state or a path?
• What is the role of knowledge?
• Does the task require human-interaction?
• Problem Classification
At each step it checks to see whether the problem it is working on is immediately solvable. If so,
then the answer is returned directly. If the problem is not easily solvable, the integrator checks to
see whether it can decompose the problem into smaller problems. It can create those problems
and calls itself recursively on using this technique of problem decomposition we can often solve
very large problem easily.
Now consider the 8-puzzle game. A sample game using the 8-puzzle is shown below:
In attempting to solve the 8 puzzle, we might make a stupid move for example; we slide the tile
5 into an empty space. We actually want to slide the tile 6 into empty space but we can back
track and undo the first move, sliding tile 5 back to where it was then we can know tile 6 so
mistake and still recovered from but not quit as easy as in the theorem moving problem. An
additional step must be performed to undo each incorrect step.
Now consider the problem of playing chess. Suppose a chess playing problem makes a stupid
move and realize a couple of moves later. But here solutions steps cannot be undone.
The above three problems illustrate difference between three important classes of problems:
1) Ignorable: in which solution steps can be ignored.
Example: Theorem Proving
2) Recoverable: in which solution steps can be undone.
Example: 8-Puzzle
3) Irrecoverable: in which solution steps cannot be undone.
Example: Chess
The recoverability of a problem plays an important role in determining the complexity of the
control structure necessary for problem solution.
Ignorable problems can be solved using a simple control structure that never backtracks.
Recoverable problems can be solved by slightly complicated control strategy that does sometimes
make mistakes using backtracking. Irrecoverable problems can be solved by recoverable style
methods via planning that expands a great deal of effort making each decision since the decision
is final.
In the uncertain problems, this planning process may not be possible. Example: Bridge Game –
Playing Bridge. We cannot know exactly where all the cards are or what the other players will do
on their turns.
We can do fairly well since we have available accurate estimates of a probabilities of each of the
possible outcomes. A few examples of such problems are
Controlling a robot arm: The outcome is uncertain for a variety of reasons. Someone
might move something into the path of the arm. The gears of the arm might stick.
Helping a lawyer decide how to defend his client against a murder charge. Here we
probably cannot even list all the possible outcomes, which leads outcome to be uncertain.
For certain-outcome problems, planning can used to generate a sequence of operators that is
guaranteed to lead to a solution.
For uncertain-outcome problems, a sequence of generated operators can only have a good
probability of leading to a solution.
Plan revision is made as the plan is carried out and the necessary feedback is provided.
Since we are interested in the answer to the question, it does not matter which path we follow. If
we do follow one path successfully to the answer, there is no reason to go back and see if some
other path might also lead to a solution. These types of problems are called as “Any path
Problems”.
Now consider the Travelling Salesman Problem. Our goal is to find the shortest path route that
visits each city exactly once.
Suppose we find a path it may not be a solution to the problem. We also try all other paths. The
shortest path (best path) is called as a solution to the problem. These types of problems are
known as “Best path” problems. But path problems are computationally harder than any path
problems.
of finding the interpretation we need to produce only the interpretation itself. No record of the
processing by which the interpretation was found is necessary. But with the “water-jug” problem
it is not sufficient to report the final state we have to show the “path” also.
So the solution of natural language understanding problem is a state of the world. And the
solution of “Water jug” problem is a path to a state.
The above two problems illustrate the difference between the problems for which a lot of
knowledge is important only to constrain the search for a solution and those for which a lot of
knowledge is required even to be able to recognize a solution.
Problem Classification
When actual problems are examined from the point of view all of these questions it becomes
apparent that there are several broad classes into which the problem fall. The classes can be each
associated with a generic control strategy that is approached for solving the problem. There is a
variety of problem-solving methods, but there is no one single way of solving all problems. Not
all new problems should be considered as totally new. Solutions of similar problems can be
exploited.
PROBLEMS
Water-Jug Problem
Problem is “You are given two jugs, a 4-litre one and a 3-litre one. One neither has any
measuring markers on it. There is a pump that can be used to fill the jugs with water. How can
you get exactly 2 litres of water into 4-litre jug?”
Solution:
The state space for the problem can be described as a set of states, where each state represents
the number of gallons in each state. The game start with the initial state described as a set of
ordered pairs of integers:
• State: (x, y)
– x = number of lts in 4 lts jug
– y = number of lts in 3 lts jug
x = 0, 1, 2, 3, or 4 y = 0, 1, 2, 3
• Start state: (0, 0) i.e., 4-litre and 3-litre jugs is empty initially.
• Goal state: (2, n) for any n that is 4-litre jug has 2 litres of water and 3-litre jug has any
value from 0-3 since it is not specified.
• Attempting to end up in a goal state.
Production Rules: These rules are used as operators to solve the problem. They are represented
as rules whose left sides are used to describe new state that result from approaching the rule.
Chess Problem
Problem of playing chess can be defined as a problem of moving around in a state space where
each state represents a legal position of the chess board.
The game start with an initial state described as an 8x8 of each position contains symbol standing
for the appropriate place in the official chess opening position. A set of rules is used to move
from one state to another and attempting to end up on one of a set of final states which is
described as any board position in which the opponent does not have a legal move as his/her
king is under attacks.
The state space representation is natural for chess. Since each state corresponds to a board
position i.e. artificial well organized.
Production Rules:
These rules are used to move around the state space. They can be described easily as a set of rules
consisting of two parts:
1. Left side serves as a pattern to be matching against the current board position.
2. Right side that serves decides the chess to be made to the board position to reflect the
move.
To describe these rules it is convenient to introduce a notation for pattern and substitutions
E.g.:
1. White pawn at square (file1,rank2)
Move pawn from square (file i, rank2) AND square (file i, rank2)
AND
8-Puzzle Problem
The Problem is 8-Puzzle is a square tray in which 8 square tiles are placed. The remaining 9th
square is uncovered. Each tile has a number on it. A file that is adjacent to the blank space can be
slide into that space. The goal is to transform the starting position into the goal position by
sliding the tiles around.
Solution:
State Space: The state space for the problem can be written as a set of states where each state is
position of the tiles on the tray.
Initial State: Square tray having 3x3 cells and 8 tiles number on it that are shuffled
2 8 3
1 6 4
7 5
Goal State
1 2 3
8 4
7 6 5
Production Rules: These rules are used to move from initial state to goal state. These are also
defined as two parts left side pattern should match with current position and left side will be
resulting position after applying the rule.
Solution:
Solution:
State Space: The state space for this problem represents states in which the cities traversed by
salesman and state described as salesman starting at any city in the given list of cities. A set of
rules is applied such that the salesman will not traverse a city traversed once. These rules are
resulted to be states with the salesman will complex the round trip and return to his starting
position.
Initial State
Salesman starting at any arbitrary city in the given list of cities
Goal State
Visiting all cities once and only and reaching his starting state
Production rules:
These rules are used as operators to move from one state to another. Since there is a path
between any pair of cities in the city list, we write the production rules for this problem as
• Visited(city[i]) AND Not Visited(city[j])
– Traverse(city[i],city[j])
• Visited(city[i],city[j]) AND Not Visited(city[k])
– Traverse(city[j],city[k])
• Visited(city[j],city[i]) AND Not Visited(city[k])
– Traverse(city[i],city[k])
• Visited(city[i],city[j],city[k]) AND Not Visited(Nil)
– Traverse(city[k],city[i])
Initial State:
Full(T1) | Empty(T2) | Empty(T3)
Goal State:
Empty(T1) | Full(T2) | Empty (T3)
Production Rules:
These are rules used to reach the Goal State. These rules use the following operations:
POP(x) Remove top element x from the stack and update top
PUSH(x,y) Push an element x into the stack and update top. [Push an element x on to
the y]
Now to solve the problem the production rules can be described as follows:
1. Top(T1)<Top(T2) PUSH(POP(T1),T2)
2. Top(T2)<Top(T1) PUSH(POP(T2),T1)
3. Top(T1)<Top(T3) PUSH(POP(T1),T3)
4. Top(T3)<Top(T1) PUSH(POP(T3),T1)
5. Top(T2)<Top(T3) PUSH(POP(T2),T3)
6. Top(T3)<Top(T2) PUSH(POP(T3),T2)
7. Empty(T1) PUSH(POP(T2),T1)
8. Empty(T1) PUSH(POP(T3),T1)
9. Empty(T2) PUSH(POP(T1),T3)
10. Empty(T3) PUSH(POP(T1),T3)
11. Empty(T2) PUSH(POP(T3),T2)
12. Empty(T3) PUSH(POP(T2),T3)
Solution: Example: 3 Disks, 3 Towers
1) T1 T2
2) T1 T3
3) T2 T3
4) T1 T2
5) T3 T1
6) T3 T2
7) T1 T2
Solution: The state space for this problem is a set of states representing the position of the
monkey, position of chair, position of the stick and two flags whether monkey on the chair &
whether monkey holds the stick so there is a 5-tuple representation.
(M, C, S, F1, F2)
– M: position of the monkey
– C: position of the chair
– S: position of the stick
– F1: 0 or 1 depends on the monkey on the chair or not
– F2: 0 or 1 depends on the monkey holding the stick or not
Production Rules:
These are the rules which have a path for searching the goal state here we assume that when
monkey hold a stick then it will swing it this assumption is necessary to simplify the
representation.
Some of the production rules are:
Solution:
1) (M,C,S,0,0)
2) (C,C,S,0,0)
3) (G,G,S,0,0)
4) (S,G,S,0,0)
5) (G,G,G,0,0)
6) (G,G,G,0,1)
7) (G,G,G,1,1)
Solution:
The state space for the problem contains a set of states which represent the present number of
cannibals and missionaries on the either side of the bank of the river.
(C,M,C1,M1,B)
– C and M are number of cannibals and missionaries on the starting bank
– C1 and M1 are number of cannibals and missionaries on the destination bank
– B is the position of the boat wither left bank (L) or right bank (R)
Production System: These are the operations used to move from one state to other state. Since at
any bank the number of cannibals must less than or equal to missionaries we can write two
production rules for this problem as follows:
C M BOAT POSITION C1 M1
3 3 0 0
1 3 2 0
2 3 1 0
0 3 3 0
1 3 2 0
1 1 2 2
2 2 1 1
2 0 1 3
3 0 0 3
1 0 2 3
2 0 1 3
0 0 3 3
Algorithm:
1) Create a variable called NODE_LIST and set it to the initial state.
2) Until a goal state is found or NODE_LIST is empty do:
a. Remove the first element from NODE_LIST and call it E. If NODE_LIST was empty
quit.
b. For each way that each rule can match the state described in E do:
i. Apply the rule to generate a new state
ii. If the new state is goal state, quit and return this state
iii. Otherwise add the new state to the end of NODE_LIST
Algorithm:
1) If the initial state is the goal state, quit return success.
2) Otherwise, do the following until success or failure is signaled
a. Generate a successor E of the initial state, if there are no more successors, signal
failure
b. Call Depth-First Search with E as the initial state
c. If success is returned, signal success. Otherwise continue in this loop.
The time to examine a single path is proportional to N. So the total time required to perform this
search is proportional to N!
Another strategy is, begin generating complete paths, keeping track of the shorter path so far and
neglecting the paths where partial length is greater than the shortest found. This method is better
than the first but it is inadequate.
HEURISTIC SEARCH
Heuristic:
– It is a "rule of thumb" used to help guide search
– It is a technique that improves the efficiency of search process, possibly by
sacrificing claims of completeness.
– It is involving or serving as an aid to learning, discovery, or problem-solving by
experimental and especially trial-and-error methods.
Heuristic Function:
– It is a function applied to a state in a search space to indicate a likelihood of
success if that state is selected
– It is a function that maps from problem state descriptions to measures of
desirability usually represented by numbers
– Heuristic function is problem specific.
The purpose of heuristic function is to guide the search process in the most profitable direction by
suggesting which path to follow first when more than one is available (best promising way).
We can find the TSM problem in less exponential items. On the average Heuristic improve the
quality of the paths that are explored. Following procedure is to solve TRS problem
– Select a Arbitrary City as a starting city
– To select the next city, look at all cities not yet visited, and select one closest to
the current city
– Repeat steps until all cities have been visited
Heuristic search methods which are the general purpose control strategies for controlling search is
often known as "weak methods" because of their generality and because they do not apply a
great deal of knowledge.
Weak Methods
a) Generate and Test
b) Hill Climbing
c) Best First Search
d) Problem Reduction
e) Constraint Satisfaction
f) Means-ends analysis
Algorithm:
1. Generate a possible solution. For some problems, this means generating a particular point
in the problem space. For others, it means generating a path from a start state.
2. Test to see if this is actually a solution by comparing the chosen point or the endpoint of
the chosen path to the set of acceptable goal states.
If there exists a solution for one problem then this strategy definitely finds the solution. Because
the complete solution must be generated before they can be tested. So, we can say that
Generate-and-test algorithm is a Depth-First Search procedure. It will take if the problem space is
very large. In the strategy we can operate by generating solution randomly instead of
systematically. Then we cannot give the surety that we will set the solution.
To implement this generate and test usually, we will use depth-first tree. If there are cycles then
we use graphs rather than a tree. This is not an efficient (mechanism) technique when the
problem is much harder. It is acceptable for simple problems. When it is combined with the other
techniques it will restrict the space.
For example, one of the most successful AI program is DENDRAL, which informs the structure of
organ i.e. components using mass spectrum and nuclear magnetic resonance data. It uses the
strategy called plan-generate-test, in which a planning process that uses constraint satisfaction
techniques, which creates lists of recommended structures. The generate-and-test procedure then
uses those lists so that it can explain only a limited set of structures, which is proved highly
effective.
Examples:
- Searching a ball in a bowl (Pick a green ball) - State
- Water Jug Problem – State and Path
Hill Climbing
A GENERATE and TEST procedure, if not only generates the alternative path but also the
direction of the path in the alternatives which be near, than all the paths in Generate and Test
procedures the heuristic function responds only yes or no but this heuristic function responds only
yes will generate an estimate of how close a given state is to a goal state.
Algorithm:
1. Evaluate the initial state. If it is also goal state then return it, otherwise continue with the
initial states as the current state.
2. Loop until the solution is found or until there are no new operators to be applied in the
current state
a) Select an operator that has not yet been applied to the current state and apply it
to produce new state
b) Evaluate the new state
i. If it is a goal state then return it and quit
ii. If it is not a goal state but it is better than the current state, then make it as
current state
iii. If it is not better than the current state, then continue in loop.
The key difference between this algorithm and generate and test algorithm is the use of an
evaluation function as a way to inject task-specific knowledge into the control process.
Algorithm:
1. Evaluate the initial state. If it is also a goal state then return it and quit. Otherwise
continue with the initial state as the current state.
2. Loop until a solution is found or until a complete iteration produces no change to current
state:
a. Let SUCC be a state such that any possible successor of the current state will be
better than SUCC.
b. For each operator that applies to the current state do:
i. Apply the operator and generate a new state.
ii. Evaluate the new state. If it is a goal state, then return it and quit. If not
compare it to SUCC. If it is better, then set SUCC to this state. If it is not
better, leave SUCC alone.
c. IF the SUCC is better than current state, then set current state to SUCC.
Bothe basic and steepest-ascent hill climbing may fail to find a solution. Either algorithm may
terminate not by finding a goal state but by getting a state from which no better states can be
generated. This will happen if the program has reached a local maximum, a plateau or a ridge.
A ridge is a special kind of maximum. It is an area of the search space that is higher than
surrounding areas and that itself has a slope.
There are some ways of dealing with these problems, although these methods are by no means
guaranteed:
Backtrack to some earlier node and try going in a different direction. This is particularly
reasonable if at that node there was another direction that looked as promising or almost
as promising as the one that was chosen earlier. This is a fairly good way to deal with
local maxima.
Make a big jump in some direction to try to get to a new section of the search space. This
is a good way of dealing with plateaus.
Apply two or more rules before doing the test. This corresponds to moving in several
directions at once. This is a good strategy for dealing with ridges.
Simulated Annealing:
A variation of hill climbing in which, at the beginning of the process, some downhill moves may
be made.
In simulated annealing at the beginning of the process some hill moves may be made. The idea is
to do enough exploration of the whole space early on. So that the final solution in relatively
insensitive to the starting state. By doing so we can lower the chances of getting caught at local
maximum, plateau or a ridge.
In this we attempt to minimize rather than maximize the value of the objective function. Thus
this process is one of valley descending in which the object function is the energy level.
Physical Annealing
• Physical substances are melted and then gradually cooled until some solid state is reached.
• The goal is to produce a minimal-energy state.
• Annealing schedule: if the temperature is lowered sufficiently slowly, then the goal will be
attained.
• Nevertheless, there is some probability for a transition to a higher energy state: e -E/kT.
The probability that a transaction to a higher energy state will occur and so given by a function:
∆ /
=
E is the +ve level in the energy level
T is the temperature
k is Boltzmann’s constant
The rate at which the system is cooled is called annealing schedule in an analogous process. The
units for both E and T are artificial. It makes sense to incorporate k into T.
Algorithm:
1. Evaluate the initial state. If it is also a goal state then return and quit. Otherwise continue
with the initial state as a current state.
2. Initialize Best-So-Far to the current state.
3. Initialize T according to the annealing schedule.
4. Loop until a solution is found or until there are no new operators left to be applied in the
current state:
a. Select an operator that has not yet been applied to the current state and apply it
to produce a new state.
b. Evaluate the new state. Compute
∆ =( )− ( )
(i) If the new state is goal state then return it and quit
(ii) If it is not a goal state but is better than the current state then make it the
current state. Also set BEST-SO-FAR to this new state.
(iii) If it is not better than the current state, then make it the current state with
probability ’ as defined above. This step is usually implemented by
invoking a random number generator to produce a number in the range
[0,1]. If that number is less than ’ then the move is accepted. Otherwise
do nothing.
c. Revise T as necessary according to the annealing schedule.
5. Return BEST-SO-FAR as the answer.
Note:
For each step we check the probability of the successor with the current state. If it is greater than
the current state the move is accepted. Otherwise move is rejected and search in other direction.
Best-First Search
Best-First Search (BFS) is a way of combining the advantages of both depth-first search and
breadth first search into a single method, i.e., is to follow a single path at a time but switch paths
whenever completing path looks more promising than the current one does.
The process is to select the most promising of the new nodes we have generated so far. We then
expand the chosen node by using the rules to generate its successors. If one of them is a solution,
then we can quit, else repeat the process until we search goal.
In BFS, one move is selected, but others are kept around so that they can be revisited later if the
selected path becomes less promising. This is not the case steepest ascent climbing.
OR Graphs
A graph is called OR graph, since each of its branches represents alternative problems solving
path.
Algorithm:
1. Start with OPEN containing just the initial state
2. Until a goal is found or there are no nodes left on OPEN do:
a. Pick the best node on OPEN
b. Generate its successors
c. For each successor do:
i. If it is not been generated before, evaluate it, add it to OPEN and record
its parent.
ii. If it has been generated before, change the parent if this new path is better
than the previous one. In that case update the cost of getting to this node
and to any successors that this node may already have.
Step 1:
A NIL
Step 2:
A NIL
BA
CA
DA
Step 3:
A NIL
BA
CA
DA
ED
FD
Step 4:
A NIL
BA
CA
DA
ED
FD
GB
HB
Step 5:
A NIL
BA
CA
DA
ED
FD
GB
HB
IE
JE
The Element with the low cost is the first element. The new states are added according to the
cost value.
A* Algorithm:
A* algorithm is a best first graph search algorithm that finds a least cost path from a given
initial node to one goal node. The simplification of Best First Search is called A* algorithm.
This algorithm uses , ℎ functions as well as the lists OPEN and CLOSED.
For many applications, it is convenient to define function as the sum of two components
that we call g and h’.
• g:
– Measures of the cost of getting from the initial state to the current node.
– It is not the estimate; it is known to be exact sum of the costs.
• h’ :
– is an estimate of the additional cost of getting from current node to goal state.
Algorithm:
1) Start with OPEN containing only the initial state (node) set that node g value 0 its ℎ’ value
to whatever it is and its ’ value ℎ’ + 0 or ℎ’. Set CLOSED to the empty list.
2) Until a goal node is found repeat the following procedure: If there are no nodes on
OPEN, report failure. Otherwise pick the node on OPEN with lowest ’ value. CALL it
BESTNODE. Remove from OPEN. Place it on CLOSED. If BESTNODE is the goal node, exit
and report a solution. Otherwise, generate the successors of BESTNODE. For each
successor, do the following
a) Set successors to point back to BESTNODE this backwards links will make possible to
recover the path once a solution is found.
b) Compute
( )= ( )+
c) If successor is already exist in OPEN call that node as OLD and we must decide
whether OLD’ s parent link should reset to point to BESTNODE (graphs exist in this
case)
If OLD is cheaper then we need do nothing. If successor is cheaper then reset
OLD’s parent link to point to BESTNODE. Record the new cheaper path in
( ) and update ’( ).
d) If SUCCESSOR was not on OPEN, see if it is on CLOSED. If so, call node on CLOSED
OLD and add OLD to the list of BESTNODE successors. Calculate all the g, f’ and h’
values for successors of that node which is better then move that.
So to propagate the new cost downward, do a depth first traversal of the tree
starting at OLD, changing each nodes value (and thus also its ’ value),
terminating each branch when you reach either a node with no successor or a
node which an equivalent or better path has already been found.
e) If successor was not already on either OPEN or CLOSED, then put it on OPEN and
add it to the list of BESTNODE successors. Compute
’( ) = ( ) + ℎ’( )
A* algorithm is often used to search for the lowest cost path from the start to the goal location in
a graph of visibility/quad tree. The algorithm solves problems like 8-puzzle problem and
missionaries & Cannibals problem.
Problem Reduction:
Planning how best to solve a problem that can be recursively decomposed into sub-
problems in multiple ways.
There can be more than one decompositions of the same problem. We have to decide
which is the best way to decompose the problem so that the total solution or cost of the
solution is good.
Examples:
o Matrix Multiplication
o Towers of Hanoi
o Blocks World Problem
o Theorem Proving
Formulations: (AND/OR Graphs)
o An OR node represents a choice between possible decompositions.
o An AND node represents a given decomposition.
The AND-OR graph (or tree) is useful for representing the solution of problems that can be
solved by decomposing them into a set of smaller problems, all of which must then be solved.
This decomposition or reduction generate arcs that we call AND arcs.
One AND arc may point to any number of successors nodes all of which must be solved in order
for the arc to point to a solution. Just as in OR graph, several arcs may emerge from a single
node, indicating a variety of ways in which the original problem might be solved.
In order to find solutions in an AND-OR graph, we need an algorithm similar to best-first search
but with the ability to handle the AND arcs appropriately.
To see why our Best-First search is not adequate for searching AND-OR graphs, consider Fig (a).
– The top node A has been expanded, producing 2 arcs, one leading to B and one leading
to C and D. The numbers at each node represent the value of f' at that node.
– We assume for simplicity that every operation has a uniform cost, so each arc with a
single successor has a cost of 1 and each AND arc with multiple successors has a cost of 1
for each of its components.
– If we look just at the nodes and choose for expansion the one with the lowest f' value,
we must select C. It would be better to explore the path going through B since to use C
we must also use D, for a total cost of 9 (C+D+2) compared to the cost of 6 that we get
through B.
– The choice of which node to expand next must depend not only on the f' value of that
node but also on whether that node is part of the current best path from the initial node.
In order to describe an algorithm for searching an AND-OR graph we need to exploit a value
that we call FUTILITY. If the estimated cost of a solution becomes greater than the value of
FUTILITY, then we abandon the search. FUTILITY should be chosen to correspond to a threshold
such any solution with a cost above it is too expensive to be practical even if it could ever be
found.
Algorithm:
1. Initialize the graph to the starting node.
2. Loop until the starting node is labeled SOLVED or until its cost goes above FUTILITY:
a. Traverse the graph, starting at the initial node following the current best path and
accumulate the set of nodes that are on that path and have not yet been
expanded or labeled solved.
b. Pick up one of those unexpanded nodes and expand it. If there are no successors,
assign FUTILITY as the value of this node. Otherwise add the successors to the
graph and each of this compute f’ (use only h’ and ignore g). If f’ of any node is
“0”, mark the node as SOLVED.
c. Change the f’ estimate of the newly expanded node to reflect the new
information provided by its successors. Propagate this change backward through
the graph. If any node contains a successor whose descendants are all solved, label
the node itself as SOLVED. At each node that is visible while going up the graph,
decide which of its successors arcs is the most promising and mark it as part of the
current best path. This may cause the current best path to change. The
propagation of revised cost estimates backup the tree was not necessary in the
best-first search algorithm because only unexpanded nodes were examined. But
now expanded nodes must be reexamined so that the best current path can be
selected. Thus it is important that their f’ values be the best estimates available.
At Step 1, A is the only node, so it is at the end of the current best path. It is expanded,
yielding nodes B, C and D. The arc to D is labeled as the most promising one emerging from
A, since it costs 6 compared to B and C, which costs 9.
In Step 2, node D is chosen for expansion. This process produces one new arc, the AND arc
to E and F, with a combined cost estimate of 10. So we update the f' value of D to 10.
We see that the AND arc B-C is better than the arc to D, so it is labeled as the current best
path. At Step 3, we traverse that arc from A and discover the unexpanded nodes B and C. If
we are going to find a solution along this path, we will have to expand both B and C
eventually. SO explore B first.
This generates two new arcs, the ones to G and to H. Propagating their f' values backward,
we update f' to B to 6. This requires updating the cost of AND arc B-C to 12 (6+4+2). Now
the arc to D is again the better path from A, so we record that as the current best path and
either node E or F will be chosen for the expansion at Step 4.
This process continues until either a solution is found or all paths have led to dead ends,
indicating that there is no solution.
Limitations
1. A longer path may be better
In Fig (a), the nodes were
generated. Now suppose that
node J is expanded at the next
step and that one of its successors is node E, producing the graph shown in Fig (b). The
new path to E is longer than the previous path to E going through C. Since the path
through C will only lead to a solution if there is also a solution to D, which there is not.
The path through J is better.
While solving any problem please don’t try to travel the nodes which are already labeled
as solved because while implementing it may be struck in loop.
2. Interactive Sub-goals
Another limitation of the algorithm fails to take into account
any interaction between sub-goals. Assume in figure that both
node C and node E ultimately lead to a solution; our algorithm
will report a complete solution that includes both of them. The
AND-OR graph states that for A to be solved, both C and D
must be solved. But the algorithm considers the solution of D as a completely separate
process from the solution of C.
While moving to the goal state, keep track of all the sub-goals we try to move which one
is giving an optimal cost.
AO* Algorithm:
AO* Algorithm is a generalized algorithm, which will always find minimum cost solution. It is
used for solving cyclic AND-OR graphs The AO* will use a single structure GRAPH representing
the part of the search graph that has been explicitly generated so far. Each node in the graph will
point both down to its immediate successors and up to immediate predecessors. The top down
traversing of the best-known path which guarantees that only nodes that are on the best path
will ever be considered for expansion. So h’ will serve as the estimate of goodness of a node.
Algorithm (1):
1) Initialize: Set G* = {s}, f(s) = h(s).
If ∈ , label s as SOLVED, where T is terminal node.
3) Select: Select a non-terminal leaf node n from the marked sub tree
4) Expand: Make explicit the successors of n.
For each new successor, m: Set f(m) = h(m)
If m is Terminal, label m as SOLVED.
Set (m) = ∑ [ ( ) + ( , ) ]
Mark the edge to each successor of m. If each successor is labeled SOLVED then label m as
SOLVED.
5. If m is an OR node with successors
r1, r2, …, rk
Set (m) = min{ ( ) + ( , ) }
Mark the edge to each successor of m. If each successor is labeled SOLVED then label m as
SOLVED.
6. If the cost or label of m has changed, then insert those parents of m into Z for which m is
marked successor.
Algorithm (2):
Means-Ends Analysis:
One general-purpose technique used in AI is means-end analysis, a step-by-step, or incremental,
reduction of the difference between the current state and the final goal. The program selects
actions from a list of means—in the case of a simple robot this might consist of PICKUP,
PUTDOWN, MOVEFORWARD, MOVEBACK, MOVELEFT, and MOVERIGHT—until the goal is
reached. This means we could solve major parts of a problem first and then return to smaller
problems when assembling the final solution.
Usually, we search strategies that can reason either forward or backward. Often, however a
mixture of the two directions is appropriate. Such mixed strategy would make it possible to solve
the major parts of problem first and solve the smaller problems arise when combining them
together. Such a technique is called "Means - Ends Analysis".
This process centers on the detection of difference between the current state and goal state. After
the difference had been found, we should find an operator which reduces the difference. But this
operator cannot be applicable to the current state. Then we have to set up a sub-problem of
getting to the state in which it can be applied if the operator does not produce the goal state
which we want. Then we should set up a sub-program of getting from state it does produce the
goal. If the chosen inference is correct, the operator is effective, then the two sub-problems
should be easier to solve than the original problem.
The means-ends analysis process can be applied recursively to them. In order to focus system
attention on the big problems first, the difference can be assigned priority levels, in which high
priority can be considered before lower priority.
Like the other problems, it also relies on a set of rules rather than can transform one state to
another these rules are not represented with complete state description. The rules are represented
as a left side that describes the conditions that must be met for the rule applicable and right side
which describe those aspects of the problem state that will be changed by the application of the
rule.
Consider the simple HOLD ROBOT DOMAIN. The available operators are as follows:
OPERATOR PRECONDITIONS RESULTS
At(robot,obj)^large(obj)^clear(obj)^
PUSH(obj,loc) At(obj,loc)^at(robot,loc)
armempty
CARRY(obj,loc) At(robot,obj)^small(obj) At(obj,loc)^at(robot,loc)
Difference Table
PUSH has 4-preconditions. Two of which produce difference between start and goal states since
the desks is already large. One precondition creates no difference. The ROBOT can be brought to
the location by using WALK, the surface can be cleared by two uses of pickup but after one pick-
up the second results in another difference – the arm must be empty. PUTDOWN can be used to
reduce the difference.
One PUSH is performed; the problem state is close to the goal state, but not quite. The objects
must be placed back on the desk. PLACE will put them there. But it cannot be applied
immediately. Another difference must be eliminated, since the robot is holding the objects. Then
we will find the progress as shown above. The final difference between C and E can be reduced
by using WALK to get the ROBOT back to the objects followed by PICKUP and CARRY.
Algorithm:
1. Until the goal is reached or no more procedures are available:
– Describe the current state, the goal state and the differences between the two.
– Use the difference the describe a procedure that will hopefully get nearer to goal.
– Use the procedure and update current state.
2. If goal is reached then success otherwise fail.
Algorithm:
Constraint Satisfaction
Search procedure operates in a space of constraint sets. Initial state contains the original
constraints given in the problem description.
A goal state is any state that has been constrained enough – Cryptarithmetic: “enough”
means that each letter has been assigned a unique numeric value.
Constraint satisfaction is a 2-step process:
o Constraints are discovered and propagated as far as possible.
o If there is still not a solution, then search begins. A guess about is made and added
as a new constraint.
To apply the constraint satisfaction in a particular problem domain requires the use of 2
kinds of rules:
o Rules that define valid constraint propagation
o Rules that suggest guesses when necessary
Goal State:
We have to assign unique digit for the above specified alphabets.
5. Predicate Logic
Introduction
Predicate logic is used to represent Knowledge. Predicate logic will be met in Knowledge
Representation Schemes and reasoning methods. There are other ways but this form is popular.
Propositional Logic
It is simple to deal with and decision procedure for it exists. We can represent real-world facts as
logical propositions written as well-formed formulas.
To explore the use of predicate logic as a way of representing knowledge by looking at a specific
example.
.
.
.
ℎ . ∶ ℸ
The above two statements becomes totally separate assertion, we would not be able to draw any
conclusions about similarities between Socrates and Plato.
( )
( )
These representations reflect the structure of the knowledge itself. These use predicates applied to
arguments.
It fails to capture the relationship between any individual being a man and that individual being
a mortal.
We need variables and quantification unless we are willing to write separate statements.
Predicate:
A Predicate is a truth assignment given for a particular statement which is either true or false. To
solve common sense problems by computer system, we use predicate logic.
Predicate Logic
Terms represent specific objects in the world and can be constants, variables or functions.
Predicate Symbols refer to a particular relation among objects.
Sentences represent facts, and are made of terms, quantifiers and predicate symbols.
Functions allow us to refer to objects indirectly (via some relationship).
Quantifiers and variables allow us to refer to a collection of objects without explicitly
naming each object.
Some Examples
o Predicates: Brother, Sister, Mother , Father
o Objects: Bill, Hillary, Chelsea, Roger
o Facts expressed as atomic sentences a.k.a. literals:
Father(Bill,Chelsea)
Mother(Hillary,Chelsea)
Brother(Bill,Roger)
Father(Bill,Chelsea)
Nested Quantification
x,y Parent(x,y) Child(y,x)
x y Loves(x,y)
x [Passtest(x) (x ShootDave(x))]
Functions
• Functions are terms - they refer to a specific object.
• We can use functions to symbolically refer to objects without naming them.
• Examples:
fatherof(x) age(x) times(x,y) succ(x)
• Using functions
o x Equal(x,x)
o Equal(factorial(0),1)
o x Equal(factorial(s(x)), times(s(x),factorial(x)))
If we use logical statements as a way of representing knowledge, then we have available a good
way of reasoning with that knowledge.
Instance Relationship
Isa Relationship
Resolution:
A procedure to prove a statement, Resolution attempts to show that Negation of Statement gives
Contradiction with known statements. It simplifies proof procedure by first converting the
statements into canonical form. Simple iterative process; at each step, 2 clauses called the parent
clauses are compared, yielding a new clause that has been inferred from them.
Resolution refutation:
Convert all sentences to CNF (conjunctive normal
form)
Negate the desired conclusion (converted to CNF)
Apply resolution rule until either
– Derive false (a contradiction)
– Can’t apply any more
Resolution refutation is sound and complete
• If we derive a contradiction, then the conclusion follows from the axioms
• If we can’t apply any more, then the conclusion cannot be proved from the axioms.
Sometimes from the collection of the statements we have, we want to know the answer of this
question - "Is it possible to prove some other statements from what we actually know?" In order
to prove this we need to make some inferences and those other statements can be shown true
using Refutation proof method i.e. proof by contradiction using Resolution. So for the asked goal
we will negate the goal and will add it to the given statements to prove the contradiction.
So resolution refutation for propositional logic is a complete proof procedure. So if the thing that
you're trying to prove is, in fact, entailed by the things that you've assumed, then you
can prove it using resolution refutation.
Clauses:
Resolution can be applied to certain class of wff called clauses.
A clause is defined as a wff consisting of disjunction of literals.
All of the following formulas in the variables A, B, C, D, and E are in conjunctive normal form:
Clause Form:
Algorithm:
1. Eliminate implies relation (→) Using (Ex: → => ⇁ ∨ )
5. Eliminate existential quantifiers. We can eliminate the quantifier by substituting for the
variable a reference to a function that produces the desired value.
∃y: President(y) => President(S1)
∀x,∃y: Fatherof(y,x )=>∀x: Fatherof(S2(s),x)
President(func()) func is called a skolem function.
In general the function must have the same number of arguments as the number of
universal quantifiers in the current scope.
Skolemize to remove existential quantifiers. This step replaces existentially
quantified variables by Skolem functions. For example, convert ( x)P(x) to P(c) where c
is a brand new constant symbol that is not used in any other sentence (c is called a
Skolem constant). More generally, if the existential quantifier is within the scope of a
universal quantified variable, then introduce a Skolem function that depends on the
universally quantified variable. For example, "x y P(x,y) is converted to "x P(x, f(x)). f
is called a Skolem function, and must be a brand new function name that does not occur
in any other part of the logic sentence.
6. Drop the prefix. At this point, all remaining variables are universally quantified.
( ) ∨ ( )
8. Create a separate clause corresponding to each conjunct in order for a well formed
formula to be true, all the clauses that are generated from it must be true.
9. Standardize apart the variables in set of clauses generated in step 8. Rename the variables.
So that no two clauses make reference to same variable.
Basis of Resolution:
Resolution process is applied to pair of parent clauses to produce a derived clause. Resolution
procedure operates by taking 2 clauses that each contain the same literal. The literal must occur
in the positive form in one clause and negative form in the other. The resolvent is obtained by
combining all of the literals of two parent clauses except ones that cancel. If the clause that is
produced in an empty clause, then a contradiction has been found.
⇁ . Hence, R is true.
Unification Algorithm
In propositional logic it is easy to determine that two literals cannot both be true at the
same time.
Simply look for L and ~L . In predicate logic, this matching process is more complicated,
since bindings of variables must be considered.
In order to determine contradictions we need a matching procedure that compares two
literals and discovers whether there exist a set of substitutions that makes them identical.
There is a recursive procedure that does this matching. It is called Unification algorithm.
The process of finding a substitution for predicate parameters is called unification.
We need to know:
– that 2 literals can be matched.
– the substitution is that makes the literals identical.
There is a simple algorithm called the unification algorithm that does this.
( , ) ( , ): ( / )( / ) , ℎ
( ( )) ( )∶ ’ !
( ) ( ) ( ) ( ): ( / , / )
ℎ ( , ℎ ) ℎ ( , ) ℎ ( , )
( ) ( ) ( )
( ℎ ( ), ) ( , ) ( , )
The object of the Unification procedure is to discover at least one substitution that causes two
literals to match. Usually, if there is one such substitution there are many
ℎ ( , )
ℎ ( , )
could be unified with any of the following substitutions:
( / , / )
( / , / )
( / , / , / )
( / , / , / )
In Unification algorithm each literal is represented as a list, where first element is the name of a
predicate and the remaining elements are arguments. The argument may be a single element
(atom) or may be another list.
The unification algorithm recursively matches pairs of elements, one pair at a time. The matching
rules are:
• Different constants, functions or predicates cannot match, whereas identical ones can.
• A variable can match another variable, any constant or a function or predicate
expression, subject to the condition that the function or [predicate expression must not
contain any instance of the variable being matched (otherwise it will lead to infinite
recursion).
• The substitution must be consistent. Substituting y for x now and then z for x later is
inconsistent. (a substitution y for x written as y/x)
Example:
Suppose we want to unify p(X,Y,Y) with p(a,Z,b).
Initially E is {p(X,Y,Y)=p(a,Z,b)}.
The first time through the while loop, E becomes {X=a,Y=Z,Y=b}.
Suppose X=a is selected next.
Then S becomes{X/a} and E becomes {Y=Z,Y=b}.
Suppose Y=Z is selected.
Then Y is replaced by Z in S and E.
S becomes{X/a,Y/Z} and E becomes {Z=b}.
Finally Z=b is selected, Z is replaced by b, S becomes {X/a,Y/b,Z/b},
and E becomes empty.
The substitution {X/a,Y/b,Z/b} is returned as an MGU.
Unification:
∀ : ( ℎ , ) → ℎ ( ℎ , )
( ℎ , )
∀ : ( , )
∀ : ( , ℎ ( ))
∀ : ( , ℎ)
( ( ℎ , ), ( ℎ , )) = { / }
( ( ℎ , ), ( , )) = { / , ℎ / }
( ( ℎ , ), ( , ℎ ( ))) = { ℎ / , ℎ ( ℎ )/ }
( ( ℎ , ), ( , ℎ)) =
Example:
John likes all kinds of food. (a) Convert all the above statements into predicate logic
Apples are food. (b) Show that John likes peanuts using back chaining
Chicken is food. (c) Convert the statements into clause form
Anything anyone eats and it is not killed is food. (d) Using Resolution show that “John likes peanuts”
Bill eats peanuts and is still alive.
Swe eats everything bill eats
Answer:
(a) Predicate Logic:
1. ∀ : ( )→ ( ℎ )
2. ( )
3. ( ℎ )
4. ∀ , ∀ : ( , ) ¬ ( )→ ( )
5. ( , ) ( )
6. ∀ : ( , ) → ( , )
(b) Backward Chaining Proof:
( ℎ , )
↑
( )
↑
( , )⋀ ( )
↑
Answering Questions
We can also use the proof procedure to answer questions such as “who tried to assassinate
Caesar” by proving:
– Tryassassinate(y,Caesar).
– Once the proof is complete we need to find out what was substitution was made
for y.
We show how resolution can be used to answer fill-in-the-blank questions, such as "When did
Marcus die?" or "Who tried to assassinate a ruler?” Answering these questions involves finding a
known statement that matches the terms given in the question and then responding with another
piece of the same statement that fills the slot demanded by the question.
A variety of ways of representing knowledge have been exploited in AI problems. In this regard
we deal with two different kinds of entities:
Facts: truths about the real world and these are the things we want to represent.
Representation of the facts in some chosen formalism. These are the things which we will
actually be able to manipulate.
The model in the above figure focuses on facts, representations and on the 2-way mappings that
must exist between them. These links are called Representation Mappings.
- Forward Representation mappings maps from Facts to Representations.
- Backward Representation mappings maps from Representations to Facts.
English or natural language is an obvious way of representing and handling facts. Regardless of
representation for facts, we use in program, we need to be concerned with English
Representation of those facts in order to facilitate getting information into or out of the system.
The first representation does not directly suggest the answer to the problem. The second may
suggest. The third representation does, when combined with the single additional facts that each
domino must cover exactly one white square and one black square.
The puzzle is impossible to complete. A domino placed on the chessboard will always cover one
white square and one black square. Therefore a collection of dominoes placed on the board will
cover an equal numbers of squares of each color. If the two white corners are removed from the
board then 30 white squares and 32 black squares remain to be covered by dominoes, so this is
impossible. If the two black corners are removed instead, then 32 white squares and 30 black
squares remain, so it is again impossible.
In the above figure, the dotted line across the top represents the abstract reasoning process that a
program is intended to model. The solid line across the bottom represents the concrete reasoning
process that a particular program performs. This program successfully models the abstract process
to the extent that, when the backward representation mapping is applied to the program’s
output, the appropriate final facts are actually generated.
If no good mapping can be defined for a problem, then no matter how good the program to
solve the problem is, it will not be able to produce answers that correspond to real answers to
the problem.
Using Knowledge
Let us consider to what applications and how knowledge may be used.
Learning: acquiring knowledge. This is more than simply adding new facts to a knowledge
base. New data may have to be classified prior to storage for easy retrieval, etc..
Interaction and inference with existing facts to avoid redundancy and replication in the
knowledge and also so that facts can be updated.
Retrieval: The representation scheme used can have a critical effect on the efficiency of
the method. Humans are very good at it. Many AI methods have tried to model human.
Reasoning: Infer facts from existing data.
No single system that optimizes all of the capabilities for all kinds of knowledge has yet been
found. As a result, multiple techniques for knowledge representation exist.
Knowledge Representation Schemes
There are four types of Knowledge Representation:
Relational Knowledge:
– provides a framework to compare two objects based on equivalent attributes
– any instance in which two different objects are compared is a relational type of
knowledge
Inheritable Knowledge:
– is obtained from associated objects
– it prescribes a structure in which new objects are created which may inherit all or a
subset of attributes from existing objects.
Inferential Knowledge
– is inferred from objects through relations among objects
– Example: a word alone is simple syntax, but with the help of other words in
phrase the reader may infer more from a word; this inference within linguistic is
called semantics.
Declarative Knowledge
– a statement in which knowledge is specified, but the use to which that knowledge
is to be put is not given.
– Example: laws, people’s name; there are facts which can stand alone, not
dependent on other knowledge
Procedural Knowledge
– a representation in which the control information, to use the knowledge is
embedded in the knowledge itself.
– Example: computer programs, directions and recipes; these indicate specific use or
implementation
Simple relational knowledge
The simplest way of storing facts is to use a relational method where each fact about a set of
objects is set out systematically in columns. This representation gives little opportunity for
inference, but it can be used as the knowledge basis for inference engines.
• Simple way to store facts.
• Each fact about a set of objects is set out systematically in columns.
• Little opportunity for inference.
• Knowledge basis for inference engines.
Given the facts it is not possible to answer simple question such as "Who is the heaviest player?"
but if a procedure for finding heaviest player is provided, then these facts will enable that
procedure to compute an answer. We can ask things like who "bats - left" and "throws - right".
Inheritable Knowledge
Here the knowledge elements inherit attributes from their parents. The knowledge is embodied
in the design hierarchies found in the functional, physical and process domains. Within the
hierarchy, elements inherit attributes from their parents, but in many cases not all attributes of the
parent elements be prescribed to the child elements.
The inheritance is a powerful form of inference, but not adequate. The basic KR needs to be
augmented with inference mechanism.
Baseball Knowledge
- isa: show class inclusion
- instance: show class membership
The directed arrows represent attributes (isa, instance, team) originates at object being
described and terminates at object or its value.
The box nodes represent objects and values of the attributes.
Viewing a node as a frame
Example: Baseball-player
Isa: Adult-Male
Bats: EQUAL handed
Height: 6-1
Batting-average: 0.252
This algorithm is simple. It describes the basic mechanism of inheritance. It does not say what to
do if there is more than one value of the instance or “isa” attribute.
This can be applied to the example of knowledge base, to derive answers to the following
queries:
team (Pee-Wee-Reese) = Brooklyn-Dodger
batting-average (Three-Finger-Brown) = 0.106
height (Pee-Wee-Reese) = 6.1
bats (Three-Finger-Brown) = right
Inferential Knowledge:
This knowledge generates new information from the given information. This new information
does not require further data gathering from source, but does require analysis of the given
information to generate new knowledge. In this, we represent knowledge as formal logic.
Example:
- given a set of relations and values, one may infer other values or relations
- a predicate logic (a mathematical deduction) is used to infer from a set of attributes.
- inference through predicate logic uses a set of logical operations to relate individual data.
- the symbols used for the logic operations are:
Procedural Knowledge
Procedural knowledge can be represented in programs in many ways. The most common way is
simply as for doing something. The machine uses the knowledge when it executes the code to
perform a task. Procedural Knowledge is the knowledge encoded in some procedures.
Unfortunately, this way of representing procedural knowledge gets low scores with respect to the
properties of inferential adequacy (because it is very difficult to write a program that can reason
about another program’s behavior) and acquisitional efficiency (because the process of updating
and debugging large pieces of code becomes unwieldy).
The most commonly used technique for representing procedural knowledge in AI programs is the
use of production rules.
Production rules, particularly ones that are augmented with information on how they are to be
used, are more procedural than are the other representation methods. But making a clean
distinction between declarative and procedural knowledge is difficult. The important difference is
in how the knowledge is used by the procedures that manipulate it.
Heuristic or Domain Specific knowledge can be represented using Procedural Knowledge.
Below are listed issues that should be raised when using knowledge representation techniques:
The attributes are called a variety of things in AI systems, but the names do not matter. What
does matter is that they represent class membership and class inclusion and that class inclusion is
transitive. The predicates are used in Logic Based Systems.
The second way can be realized using semantic net and frame based systems. This Inverses is
used in Knowledge Acquisition Tools.
This also provides information about constraints on the values that the attribute can have and
mechanisms for computing those values.
Introduce an explicit notation for temporal interval. If two different values are ever
asserted for the same temporal interval, signal a contradiction automatically.
Assume that the only temporal interval that is of interest is now. So if a new value is
asserted, replace the old value.
Provide no explicit support. Logic-based systems are in this category. But in these systems,
knowledge base builders can add axioms that state that if an attribute has one value then
it is known not to have all other values.
Choosing the Granularity of Representation Primitives are fundamental concepts such as holding,
seeing, playing and as English is a very rich language with over half a million words it is clear we
will find difficulty in deciding upon which words to choose as our primitives in a series of
situations. Separate levels of understanding require different levels of primitives and these need
many rules to link together similar primitives.
{ : − ( )∧ℎ − ℎ ( )} –
− ℎ ℎ ℎ
The declarative representation is one in which the knowledge is specified but how to use to
which that knowledge is to be put is not given.
Declarative knowledge answers the question 'What do you know?'
It is your understanding of things, ideas, or concepts.
In other words, declarative knowledge can be thought of as the who, what, when,
and where of information.
Declarative knowledge is normally discussed using nouns, like the names of people,
places, or things or dates that events occurred.
The procedural representation is one in which the control information i.e., necessary to use the
knowledge is considered to be embedded in the knowledge itself.
Procedural knowledge answers the question 'What can you do?'
While declarative knowledge is demonstrated using nouns,
Procedural knowledge relies on action words, or verbs.
It is a person's ability to carry out actions to complete a task.
The real difference between declarative and procedural views of knowledge lies in which the
control information presides.
Example:
1. ( )
2. ( )
3. ∀ : ( )→ ( )
4. ( )
The statements 1, 2 and 3 are procedural knowledge and 4 is a declarative knowledge.
In both the cases, the control strategy is it must cause motion and systematic. The production
system model of the search process provides an easy way of viewing forward and backward
reasoning as symmetric processes.
Consider the problem of solving a particular instance of the 8-puzzle problem. The rules to be
used for solving the puzzle can be written as:
state we want. Use the left sides of the rules to generate the nodes at this second level of
the tree.
Generate the next level of the tree by taking each node at the previous level and finding
all the rules whose right sides match it. Then use the corresponding left sides to generate
the new nodes.
Continue until a node that matches the initial state is generated.
This method of reasoning backward from the desired final state is often called goal-
directed reasoning.
To reason forward, the left sides (preconditions) are matched against the current state and the
right sides (results) are used to generate new nodes until the goal is reached. To reason
backward, the right sides are matched against the current node and the left sides are used to
generate new nodes representing new goal states to be achieved.
Now suppose that at some point, the left side of a rule was nearly satisfied – nine out of ten of its
preconditions were met. It might be efficient to apply backward reasoning to satisfy the tenth
precondition in a directed manner, rather than wait for forward chaining to supply the fact by
accident.
Whether it is possible to use the same rules for both forward and backward reasoning also
depends on the form of the rules themselves. If both left sides and right sides contain pure
assertions, then forward chaining can match assertions on the left side of a rule and add to the
state description the assertions on the right side. But if arbitrary procedures are allowed as the
right sides of rules then the rules will not be reversible.
Logic Programming
Logic Programming is a programming language paradigm in which logical assertions
are viewed as programs.
There are several logic programming systems in use today, the most popular of which
is PROLOG.
A PROLOG program is described as a series of logical assertions, each of which is a
Horn clause.
A Horn clause is a clause that has at most one positive literal. Thus p, p q, p q
are all Horn clauses.
Syntactic Difference between the logic and the PROLOG representations, including:
In logic, variables are explicitly quantified. In PROLOG, quantification is provided
implicitly by the way the variables are interpreted.
o The distinction between variables and constants is made in PROLOG by having all
variables begin with uppercase letters and all constants begin with lowercase
letters.
In logic, there are explicit symbols for and () and or (). In PROLOG, there is an explicit
symbol for and (,), but there is none for or.
In logic, implications of the form “p implies q” as written as pq. In PROLOG, the same
implication is written “backward” as q: -p.
Example:
The first two of these differences arise naturally from the fact that PROLOG programs are actually
sets of Horn Clauses that have been transformed as follows:
1. If the Horn Clause contains no negative literals (i.e., it contains a single literal which is
positive), then leave it as it is.
2. Otherwise, return the Horn clause as an implication, combining all of the negative literals
into the antecedent of the implication and leaving the single positive literal (if there is
one) as the consequent.
This procedure causes a clause, which originally consisted of a disjunction of literals (all but one
of which were negative), to be transformed to single implication whose antecedent is a
conjunction of (what are now positive) literals.
Matching
We described the process of using search to solve problems as the application of appropriate
rules to individual problem states to generate new states to which the rules can then be applied
and so forth until a solution is found.
How we extract from the entire collection of rules those that can be applied at a given point? To
do so requires some kind of matching between the current state and the preconditions of the
rules. How should this be done? The answer to this question can be critical to the success of a
rule based system.
A more complex matching is required when the preconditions of rule specify required properties
that are not stated explicitly in the description of the current state. In this case, a separate set of
rules must be used to describe how some properties can be inferred from others. An even more
complex matching process is required if rules should be applied and if their pre condition
approximately match the current situation. This is often the case in situations involving physical
descriptions of the world.
Indexing
One way to select applicable rules is to do a simple search though all the rules comparing each
one’s precondition to the current state and extracting all the one’s that match. There are two
problems with this simple solution:
i. The large number of rules will be necessary and scanning through all of them at every step
would be inefficient.
ii. It’s not always obvious whether a rule’s preconditions are satisfied by a particular state.
Solution: Instead of searching through rules use the current state as an index into the rules and
select the matching one’s immediately.
Matching process is easy but at the price of complete lack of generality in the statement of the
rules. Despite some limitations of this approach, Indexing in some form is very important in the
efficient operation of rule based systems.
Backward-chaining systems usually use depth-first backtracking to select individual rules, but
forward-chaining systems generally employ sophisticated conflict resolution strategies to choose
among the applicable rules.
While it is possible to apply unification repeatedly over the cross product of preconditions and
state description elements, it is more efficient to consider the many-many match problem, in
which many rules are matched against many elements in the state description simultaneously.
One efficient many-many match algorithm is RETE.
Match
Execute Select
INFERENCE ENGINE
The above cycle is repeated until no rules are put in the conflict set or until stopping condition is
reached. In order to verify several conditions, it is a time consuming process. To eliminate the
need to perform thousands of matches of cycles on effective matching algorithm is called RETE.
RETE Algorithm is many-match algorithm (In which many rules are matched against many
elements). RETE uses forward chaining systems which generally employee sophisticated conflict
resolution strategies to choose among applicable rules. RETE gains efficiency from 3 major
sources.
1. RETE maintains a network of rule condition and it uses changes in the state
description to determine which new rules might apply. Full matching is only
pursued for candidates that could be affected by incoming/outgoing data.
2. Structural Similarity in rules: RETE stores the rules so that they share structures in
memory, set of conditions that appear in several rules are matched once for cycle.
3. Persistence of variable binding consistency. While all the individual preconditions
of the rule might be met, there may be variable binding conflicts that prevent the
rule from firing.
( , ℎ ) ( , )
( , )^ ( , ) ( , )
can be minimized. RETE remembers its previous calculations and is able to merge
new binding information efficiently.
Approximate Matching:
Rules should be applied if their preconditions approximately match to the current situation
Eg: Speech understanding program
Rules: A description of a physical waveform to phones
Physical Signal: difference in the way individuals speak, result of background noise.
Conflict Resolution:
When several rules matched at once such a situation is called conflict resolution. There are 3
approaches to the problem of conflict resolution in production system.
1. Preference based on rule match:
a. Physical order of rules in which they are presented to the system
b. Priority is given to rules in the order in which they appear
What is reasoning?
When we require any knowledge system to do something it has not been explicitly told
how to do it must reason.
The system must figure out what it needs to know from what it already knows.
◦ Reasoning is the act of deriving a conclusion from certain premises using a given
methodology. (Process of thinking/ Drawing inference)
How can we Reason?
To a certain extent this will depend on the knowledge representation chosen.
Although a good knowledge representation scheme has to allow easy, natural and
plausible reasoning.
Monotonic Reasoning
Predicate logic and the inferences we perform on it is an example of monotonic reasoning. In
monotonic reasoning if we enlarge at set of axioms we cannot retract any existing assertions or
axioms.
A monotonic logic cannot handle
Reasoning by default
◦ Because consequences may be derived only because of lack of evidence of the
contrary
Abductive Reasoning
◦ Because consequences are only deduced as most likely explanations.
Belief Revision
◦ Because new knowledge may contradict old beliefs.
Non-Monotonic Reasoning
Non monotonic reasoning is one in which the axioms and/or the rules of inference are
extended to make it possible to reason with incomplete information. These systems
preserve, however, the property that, at any given moment, a statement is either
o believed to be true,
o believed to be false, or
o not believed to be either.
Statistical Reasoning: in which the representation is extended to allow some kind of
numeric measure of certainty (rather than true or false) to be associated with each
statement.
In a system doing non-monotonic reasoning the set of conclusions may either grow or
shrink when new information is obtained.
Non-monotonic logics are used to formalize plausible reasoning, such as the following
inference step:
Birds typically fly.
Tweety is a bird.
--------------------------
Tweety (presumably) flies.
Such reasoning is characteristic of commonsense reasoning, where default rules are
applied when case-specific information is not available. The conclusion of non-monotonic
argument may turn out to be wrong. For example, if Tweety is a penguin, it is
incorrect to conclude that Tweety flies.
Non-monotonic reasoning often requires jumping to a conclusion and subsequently
retracting that conclusion as further information becomes available.
All systems of non-monotonic reasoning are concerned with the issue of consistency.
Inconsistency is resolved by removing the relevant conclusion(s) derived previously by
default rules.
Simply speaking, the truth value of propositions in a nonmonotonic logic can be classified
into the following types:
o facts that are definitely true, such as "Tweety is a bird"
Properties of FOPL
It is complete with respect to the domain of interest.
It is consistent.
The only way it can change is that new facts can be added as they become available.
◦ If these new facts are consistent with all the other facts that have already have
been asserted, then nothing will ever be retracted from the set of facts that are
known to be true.
◦ This is known as “monotonicity”.
If any of these properties is not satisfied, conventional logic based reasoning systems become
inadequate.
Non monotonic reasoning systems, are designed to be able to solve problems in which all of
these properties may be missing
Issues to be addressed:
How can the knowledge base be extended to allow inferences to be made on the basis of
lack of knowledge as well as on the presence of it?
o We need to make clear the distinction between
It is known that P.
It is not known whether P.
o First-order predicate logic allows reasoning to be based on the first of these.
o In our new system, we call any inference that depends on the lack of some piece
of knowledge a non-monotonic inference.
o Traditional systems based on predicate logic are monotonic. Here number of
statements known to be true increases with time.
o New statements are added and new theorems are proved, but the previously
known statements never become invalid.
How can the knowledge base be updated properly when a new fact is added to the
system(or when the old one is removed)?
o In Non-Monotonic systems, since addition of a fact can cause previously
discovered proofs to become invalid,
how can those proofs, and all the conclusions that depend on them be
found?
Solution: keep track of proofs, which are often called justifications.
o Such a recording mechanism also makes it possible to support,
monotonic reasoning in the case where axioms must occasionally be
retracted to reflect changes in the world that is being modeled.
How can knowledge be used to help resolve conflicts when there are several in consistent
non monotonic inferences that could be drawn?
o It turns out that when inferences can be based
Default Reasoning
Non monotonic reasoning is based on default reasoning or “most probabilistic choice”.
◦ S is assumed to be true as long as there is no evidence to the contrary.
Default reasoning ( or most probabilistic choice) is defined as follows:
◦ Definition 1 : If X is not known, then conclude Y.
◦ Definition 2 : If X can not be proved, then conclude Y.
◦ Definition 3: If X can not be proved in some allocated amount of time then
conclude Y.
Default Reasoning
This is a very common form of non-monotonic reasoning.
Here we want to draw conclusions based on what is most likely to be true.
Two Approaches to do this
◦ Non-Monotonic Logic
◦ Default Logic
Non-Monotonic reasoning is generic descriptions of a class of reasoning.
Non-Monotonic logic is a specific theory.
The same goes for Default reasoning and Default logic.
Non-monotonic Logic
One system that provides a basis for default reasoning is Non-monotonic Logic (NML).
This is basically an extension of first-order predicate logic to include a modal operator, M.
Default Logic
An alternative logic for performing default based reasoning is Reiter’s Default Logic (DL).
Default logic introduces a new inference rule of the form:
A:B
C
which states if A is provable and it is consistent to assume B then conclude C.
Now this is similar to Non-monotonic logic but there are some distinctions:
New inference rules are used for computing the set of plausible extensions. So in the
Nixon example above Default logic can support both assertions since is does not say
anything about how choose between them -- it will depend on the inference being made.
In Default logic any nonmonotonic expressions are rules of inference rather than
expressions. They cannot be manipulated by the other rules of inference. This leads to
some unexpected results.
Inheritance:
One very common use of nonmonotonic reasoning is as a basis for inheriting attribute values
from a prototype description of a class to the individual entities that belong to the class.
Considering the Baseball example in Inheritable Knowledge, and try to write its inheriting
knowledge as rules in DL.
We can write a rule to account for the inheritance of a default value for the height of a baseball
player as:
Which prohibits someone from having more than one height, then we would not be able to
apply the default rule. Thus an explicitly stated value will block the inheritance of a default value,
which is exactly what we want.
Let's encode the default rule for the height of adult males in general. If we pattern it after the one
for baseball players, we get
Unfortunately, this rule does not work as we would like. In particular, if we again assert
Pitcher(Three-Finger-Brown) then the resulting theory contains 2 extensions: one in which our
first rule fires and brown’s height is 6-1 and one in which this rule applies and Brown’s height is 5-
10. Neither of these extensions is preferred. In order to state that we prefer to get a value from
the more specific category, baseball player, we could rewrite the default rule for adult males in
general as:
This effectively blocks the application of the default knowledge about adult males in this case that
more specific information from the class of baseball players is available. Unfortunately, this
approach can become widely as the set of exceptions to the general rule increases. We would
end up with a rule like:
A clearer approach is to say something like. Adult males typically have a height of 5-10 unless
they are abnormal in some way. We can then associate with other classes the information that
they are abnormal in one or another way. So we could write, for example:
Abduction
Abductive reasoning is to abduce (or take away) a logical assumption, explanation, inference,
conclusion, hypothesis, or best guess from an observation or set of observations. Because the
conclusion is merely a best guess, the conclusion that is drawn may or may not be true. Daily
decision-making is also an example of abductive reasoning.
If we notice Spots, we might like to conclude measles, but it may be wrong. But may be a best
guess, we can make about what is going on. Deriving conclusions in this way is abductive
reasoning (a form of default reasoning).
Given 2 wff’s (AB) & (B), for any expression A & B, if it is consistent to assume A, do
so.
Minimalist Reasoning
We describe methods for saying a very specific and highly useful class of things that are generally
true. These methods are based on some variant of the idea of a minimal model. We will define a
model to be minimal if there are no other models in which fewer things are true. The idea
behind using minimal models as a basis for non-monotonic reasoning about the world is the
following –
There are many fewer true statements than false ones.
If something is true and relevant it makes sense to assume that it has been entered into
our knowledge base.
Therefore, assume that the only true statements are those that necessarily must be true in
order to maintain the consistency.
The extended KB
( ) ∨ ( )
( )
( ) is inconsistent.
The problem is that we have assigned a special status to the positive instances of predicates as
opposed to negative ones. CWA forces completion of KB by adding negative assertion whenever
it is consistent to do so.
CWA captures part of the idea that anything that must not necessarily be true should be assumed
to be false , it does not capture all of it.
It has two essential limitations:
It operates on individual predicates without considering interactions among predicates
that are defined in the KB.
It assumes that all predicates have all their instances listed. Although in many database
applications this is true, in many KB systems it is not.
Circumscription
Circumscription is a rule of conjecture (conclusion formed on the basis of incomplete
information) that allows you
◦ to jump to the conclusion that the objects you can show that posses a certain
property, p, are in fact all the objects that posses that property.
Circumscription can also cope with default reasoning. Several theories of circumscription
have been proposed to deal with the problems of CWA.
Circumscription together with first order logic allows a form of Non-monotonic
Reasoning.
Suppose we know:
( )
: ( ) ( )
: ( ) ( )
and we wish to add the fact that typically, birds fly.
In circumscription this phrase would be stated as:
This is where we apply circumscription and, in this case, we will assume that those things that are
shown to be abnormal are the only things to be abnormal. Thus we can rewrite our default
rule as:
: ( ) ( ) ( )
and add the following
: ( )
since there is nothing that cannot be shown to be abnormal.
If we circumscribe abnormal now we would add the sentence, penguin (tweety) is the abnormal
thing:
: ( ) ( ).
A Justification-based truth maintenance system (JTMS) is a simple TMS where one can examine
the consequences of the current set of assumptions. In JTMS labels are attached to arcs from
sentence nodes to justification nodes. This label is either "+" or "-". Then, for a justification node
we can talk of its IN-LIST, the list of its inputs with "+" label, and of its OUT-LIST, the list of its
inputs with "-" label.
The meaning of sentences is not known. We can have a node representing a sentence p and one
representing ~p and the two will be totally unrelated, unless relations are established between
them by justifications. For example, we can write:
~p^p Contradiction Node
o
|
x 'x' denotes a justification node
/ \ 'o' denotes a sentence node
+/ \+
o o
p ~p
which says that if both p and ~p are IN we have a contradiction.
The association of IN or OUT labels with the nodes in a dependency network defines an in-out-
labeling function. This function is consistent if:
The label of a junctification node is IN iff the labels of all the sentence nodes in its in-list
are all IN and the labels of all the sentence nodes in its out-list are OUT.
The label of a sentence node is IN iff it is a premise, or an enabled assumption node, or it
has an input from a justification node with label IN.
A set of important reasoning operations that a JTMS does not perform, including:
Applying rules to derive conclusions
Creating justifications for the results of applying rules
Choosing among alternative ways of resolving a contradiction
Detecting contradictions
All of these operations must be performed by the problem-solving program that is using the
JTMS.
However as reasoning proceeds contradictions arise and the ATMS can be pruned
o Simply find assertion with no valid justification.
The ATMS like the JTMS is designed to be used in conjunction with a separate problem solver.
The problem solver’s job is to:
Create nodes that correspond to assertions (both those that are given as axioms and those
that are derived by the problem solver).
Associate with each such node one or more justifications, each of which describes
reasoning chain that led to the node.
Inform the ATMS of inconsistent contexts.
This is identical to the role of the problem solver that uses a JTMS, except that no explicit choices
among paths to follow need to be made as reasoning proceeds. Some decision may be necessary
at the end, though, if more than one possible solution still has a consistent context.
The role of the ATMS system is then to:
Propagate inconsistencies, thus ruling out contexts that include subcontexts (set of
assertions) that are known to be inconsistent.
Label each problem solver node with the contexts in which it has a valid justification. This
is done by combining contexts that correspond to the components of a justification. In
particular, given a justification of the form
1 ∧ 2 ∧… ∧ →
assign as a context for the node corresponding to C the intersection of the contexts
corresponding to the nodes A1 through An.
Contexts get eliminated as a result of the problem-solver asserting inconsistencies and the ATMS
propagating them. Nodes get created by the problem-solver to represent possible components of
a problem solution. They may then get pruned from consideration if all their context labels get
pruned.
8. Statistical Reasoning
Introduction:
Statistical Reasoning: The reasoning in which the representation is extended to allow some kind
of numeric measure of certainty (rather than true or false) to be associated with each statement.
A fact is believed to be true or false. For some kind of problems, describe beliefs that are not
certain but for which there is a supporting evidence.
There are 2 class of problems:
First class contain problems in which there is genuine randomness in the word.
o Example: Cards Playing
Second class contains problems that could in principle be modeled using the technique we
described (i.e. resolution from predicate logic)
o Example: Medical Diagnosis
Read this expression as the probability of hypothesis H given that we have observed evidence E.
To compute this, we need to take into account the prior probability of H and the extent to
which E provides evidence of H.
Suppose, for example, that we are interested in examining the geological evidence at a particular
location to determine whether that would be a good place to dig to find a desired mineral. If we
know the prior probabilities of finding each of the various minerals and we know the
probabilities that if a mineral is present then certain physical characteristics will be observed, then
we use the Baye’s formula to compute from the evidence we collect, how likely it is that the
various minerals are present.
The key to using Baye’s theorem as a basis for uncertain reasoning is to recognize exactly what it
says.
Suppose we are solving a medical diagnosis problem. Consider the following assertions:
: ℎ
: ℎ
: ℎ ℎ ℎ .
Without any additional evidence, the presence of spots serves as evidence in favor of
measles. It also serves as evidence of fever since measles would cause fever.
Suppose we already know that the patient has measles. Then the additional evidence that
he has spots actually tells us nothing about fever.
Either spots alone or fever alone would constitute evidence in favor of measles.
If both are present, we need to take both into account in determining the total weight of
evidence.
We describe one practical way of compromising on a pure Bayesian system. MYCIN system is an
example of an expert system, since it performs a task normally done by a human expert. MYCIN
system attempts to recommend appropriate therapies for patients with bacterial infections. It
interacts with the physician to acquire the clinical data it needs. We concentrate on the use of
probabilistic reasoning.
MYCIN represents most of its diagnostic knowledge as a set of rules. Each rule has associated with
it a certainty factor, which is a measure of the extent to which the evidence is described by
antecedent of the rule, supports the conclusion that is given in the rule’s consequent. It uses
backward reasoning to the clinical data available from its goal of finding significant disease-
causing organisms.
[ , ] = [ , ]– [ , ]
Any particular piece of evidence either supports or denies a hypothesis (but not both), a single
number suffices for each rule to define both the MB and MD and thus the CF. CF’s reflect
assessments of the strength of the evidence in support of the hypothesis.
We must first need to describe some properties that we like combining functions to satisfy:
◦ Combining function should be commutative and Associative
◦ Until certainty is reached additional conforming evidence should increase MB
◦ If uncertain inferences are chained together then the result should be less certain than
either of the inferences alone
From MB and MD, CF can be computed. If several sources of corroborating evidence are pooled,
the absolute value of CF will increase. If conflicting evidence is introduced, the absolute value of
CF will decrease.
that led us to believe in s (for example, the actual readings of the laboratory instruments or
results of applying other rules). Then:
It turns out that these definitions are incompatible with a Bayesian view of conditional
probability. Small changes to them however make them compatible. We can redefine MB as
MYCIN uses CF. The CF can be used to rank hypothesis in order of importance. Example, if a
patient has certain symptoms that suggest several possible diseases. Then the disease with higher
CF would be investigated first. If E then H CF(rule) = level of belief of H given E.
Example: CF(E) = CF(it will probably rain today) = 0.6
Positive CF means evidence supports hypothesis.
The first scenario (a), Our example rule has three antecedents with a single CF rather than three
separate rules; this makes the combination rules unnecessary. The rule writer did this because the
three antecedents are not independent.
To see how much difference MYCIN’s independence assumption can make, suppose for the
moment that we had instead had three separate rules and that the CF of each was 0.6. This could
happen and still be consistent with the combined CF of 0.7 if three conditions overlap
substantially. If we apply the MYCIN combination formula to the three separate rules, we get
Let’s consider what happens when independence assumptions are violated in the scenario of (c):
BAYESION NETWORKS
CFs is a mechanism for reducing the complexity of a Bayesian reasoning system by making some
approximations to the formalism. Bayesian networks in which we preserve the formalism and
rely instead on the modularity of the world we are trying to model. Bayesian Network is also
called Belief Networks.
The basic idea of Bayesian Network is knowledge in the world is modular. Most events are
conditionally independent of other events. Adopt a model that can use local representation to
allow interactions between events that only affect each other. The main idea is that to describe
the real world it is not necessary to use a huge list of joint probabilities table in which list of
probabilities of all conceivable combinations of events. Some events may only be unidirectional
others may be bidirectional events may be casual and thus get chained tighter in network.
Implementation:
A Bayesian Network is a directed acyclic graph. A graph where the directions are links which
indicate dependencies that exist between nodes. Nodes represent propositions about events or
events themselves. Conditional probabilities quantify the strength of dependencies.
Eg: Consider the following facts
S: Sprinklers was on the last night
W: Grass is wet
R: It rained last night
From the above diagram, Sprinkler suggests Wet and Wet suggests Rain. (a) shows the flow of
constraints.
There are two different ways that propositions can influence the likelihood of each other.
The first is that causes. Influence the likelihood of their symptoms.
The second is that the symptoms affect the likelihood of all of its possible causes.
Rules:
(i) If the sprinkler was ON last night then the grass will be wet this morning
(ii) If grass is wet this morning then it rained last night
(iii) By chaining (if two rules are applied together) we believe that it rained because we
believe that sprinkler was ON.
The idea behind the Bayesian network structure is to make a clear distinction between these two
kinds of influence.
Since each row must sum to one. Since the C node has no parents, its CPT specifies the prior
probability that is cloudy (in this case, 0.5).
Dempster-Shafer Theory
So far we considered individual propositions and assign each of them a point of degree of belief
that is warranted for given evidence. The Dempster Shafer theory approach considers sets of
propositions and assigns each of them an interval
{ , }
in which the degree of belief must lie.
Belief measures the strength of evidence in favor of the set of propositions. It ranges from 0 to 1
where 0 indicates no evidence and 1 denoting certainty.
Plausability (PL) is defined as
( )= 1− ( )
It also ranges from 0 to 1 and measures the extent to which evidence in favour of ⇁S leaves
room for belief in S.
The confidence interval is then defined as [B(E),PL(E)]
where
where i.e. all the evidence that makes us believe in the correctness of P, and
Suppose we are given two belief statements M1 ∧ M2. Let S be the subset of Θ which M1 assigns a
non-zero value & let y be corresponding set to M2. We define the combination M3 of M1 & M2.
E.g.:
Fuzzy Logic
Fuzzy logic is an alternative for representing some kinds of uncertain knowledge. Fuzzy logic is a
form of many-valued logic; it deals with reasoning that is approximate rather than fixed and
exact. Compared to traditional binary sets (where variables may take on true or false values),
fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic
has been extended to handle the concept of partial truth, where the truth value may range
between completely true and completely false. Fuzzy set theory defines set membership as a
possibility distribution.
Weak slot and filler structures turns out to be useful one for reasons besides the support of
inheritance, though, including
It enables attribute values to be retrieved quickly
assertions are indexed by the entities
binary predicates are indexed by first argument. .
Properties of relations are easy to describe.
It allows ease of consideration as it embraces aspects of object oriented programming
including modularity and ease of viewing by people.
Weak slot and filler structures describe two views: . These talk about the
representations themselves and about techniques for reasoning with them. They do not say much
about the specific knowledge that the structures should contain. We call these as “knowledge
poor” structures.
A is an attribute value pair in its simplest form. A is a value that a slot can take -- could
be a numeric, string (or any data type) value or a pointer to another slot. A
does not consider the of the representation.
Semantic Nets were originally designed as a way to represent the meaning of English words. The
main idea is that the meaning of a concept comes from the ways in which it is connected to other
concepts. The information is stored by interconnecting nodes with labeled arcs. Semantic nets
initially we used to represent labeled connections between nodes.
The Semantic Nets can be represented in different ways by using relationships. Semantic nets have
been used to represent a variety of knowledge in a variety of different programs.
One of the early ways that semantic nets were used was to find relationships among objects by
spreading activation out from each of two nodes and seeing where the activation met. This
process is called
Semantic nets are natural way to represent relationships that would appear as ground instances of
binary predicates in predicate logic. Arcs from the figure could be represented in logic as
Many unary predicates in logic can be thought as binary predicates using isa and instance.
man(marcus) instance(Marcus,man)
Three or more place predicates can also be converted to a binary form by creating one new
object representing the entire predicate statement and then introducing binary predicates to
describe the relationship to this new object of each of the original arguments.
score(Cubs, Dodgers, 5-3) can be represented in semantic net by creating a
node to represent the specific game & then relating each of the three pieces of
information to it.
This technique is particularly useful for representing the contents of a typical declarative sentence
that describes several aspects of a particular event.
In the networks, some distinctions are glossed that are important in reasoning. For example,
there should be difference between a link that defines a new entity and one that relates two
existing entities.
Both nodes represent objects that exist independently of their relationship to each other.
H1 and H2 are new concepts representing John’s height and Bill’s height. They are defined by
their relationships to the nodes John and Bill.
The procedures that operate on nets such as this can exploit the fact that some arcs such as
define new entities, while others such as and , merely describe relationships
among existing entities.
To represent simple quantified expressions in semantic nets. The way is to partition the semantic
net into a hierarchical set of spaces, each of which corresponds to the scope of one or more
variables.
The statement:
The nodes Dogs, Bite, Mail-Carrier represent the classes of dogs, bitings and mail carriers
respectively, while the nodes d, b, m represent a particular dog, biting and a particular mail-
carrier. This fact can be represented easily by a single net without no partitioning.
The statement:
: ( ) : ( ) ( , )
The node g stands for the assertion given above. Node g is an instance of the special class GS of
general statements about the world (i.e., those with universal quantifiers). Every element of GS
has at least two attributes: a form, which states the relation that is being asserted, and one or
more connections, one for each of the universally quantified variables. In this example, for
The statement:
In this net, the node c representing the victim lies outside the form of the general statement. Thus
it is not viewed as an existentially quantified variable whose value may depend on the value of
d.
The statement:
In this case, g has two links, one pointing to d, which represents any dog and one pointing to
m, representing any mail carrier.
As we expand the range of problem solving tasks that the representation must
support, the representation necessarily begins to become more complex.
It becomes useful to assign more structure to nodes as well as to links.
The more structure the system has, the more likely it is to be termed a frame system.
Slot and Filler Structures are a device to support property inheritance along isa and instance
links.
◦ Knowledge in these is structured as a set of entities and their attributes.
This structure turns out to be useful for following reasons:
◦ It enables attribute values to be retrieved quickly
assertions are indexed by the entities
binary predicates are indexed by first argument. E.g. team(Mike-Hall , Cardiff).
◦ Properties of relations are easy to describe .
◦ It allows ease of consideration as it embraces aspects of object oriented programming.
Modularity
Ease of viewing by people.
Inheritance
-- the isa and instance representation provide a mechanism to implement this.
Inheritance also provides a means of dealing with default reasoning. E.g. we could represent:
Emus are birds.
Partitioned Networks
Partitioned Semantic Networks allow for:
propositions to be made without commitment to truth.
expressions to be quantified.
Basic idea: Break network into spaces which consist of groups of nodes and arcs and regard
each space as a node.
Consider the following: Andrew believes that the earth is flat. We can encode the proposition the
earth is flat in a space and within it have nodes and arcs the represent the fact. We can the have
nodes and arcs to link this space the the rest of the network to represent Andrew's belief.
NEED OF FRAMES
Frame is a type of schema used in many AI applications including vision and natural language
processing. Frames provide a convenient structure for representing objects that are typical to
stereotypical situations. The situations to represent may be visual scenes, structure of complex
physical objects, etc. Frames are also useful for representing commonsense knowledge. As frames
allow nodes to have structures they can be regarded as three-dimensional representations of
knowledge.
A frame is similar to a record structure and corresponding to the fields and values are slots and
slot fillers. Basically it is a group of slots and fillers that defines a stereotypical object. A single
frame is not much useful. Frame systems usually have collection of frames connected to each
other. Value of an attribute of one frame may be another frame.
A frame for a book is given below.
Slots Fillers
publisher Thomson
title Expert Systems
author Giarratano
edition Third
year 1998
pages 600
The above example is simple one but most of the frames are complex. Moreover with filler slots
and inheritance provided by frames powerful knowledge representation systems can be built.
Frames can represent either generic or frame. Following is the example for generic frame.
Slot Fillers
name computer
specialization_of a_kind_of machine
types (desktop, laptop,mainframe,super)
if-added: Procedure ADD_COMPUTER
speed default: faster
if-needed: Procedure FIND_SPEED
location (home,office,mobile)
under_warranty (yes, no)
The fillers may values such as computer in the name slot or a range of values as in type’s slot.
The procedures attached to the slots are called procedural attachments. There are mainly three
types of procedural attachments: if-needed, default and if-added. As the name implies if-needed
types of procedures will be executed when a filler value is needed. Default value is taken if no
other value exists. Defaults are used to represent commonsense knowledge. Commonsense is
generally used when no more situation specific knowledge is available.
The if-added type is required if any value is to be added to a slot. In the above example, if a new
type of computer is invented ADD_COMPUTER procedure should be executed to add that
information. An if-removed type is used to remove a value from the slot.
Person
isa: Mammal
Cardinality:
Adult-Male
isa: Person
Cardinality:
Rugby-Player
isa: Adult-Male
Cardinality:
Height:
Weight:
Position:
Team:
Team-Colours:
Back
isa: Rugby-Player
Cardinality:
Tries:
Mike-Hall
instance: Back
Height: 6-0
Position: Centre
Team: Cardiff-RFC
Team-Colours: Black/Blue
Rugby-Team
isa: Team
Cardinality:
Team-size: 15
Coach:
Note
The isa relation is in fact the subset relation.
The instance relation is in fact element of.
The isa attribute possesses a transitivity property. This implies: Robert-Howley is
a Back and a Back is a Rugby-Player who in turn is an Adult-Male and also a Person.
Both isa and instance have inverses which are called subclasses or all instances.
There are attributes that are associated with the class or set such as cardinality and on the
other hand there are attributes that are possessed by each member of the class or set.
Solution: MetaClasses
A metaclass is a special class whose elements are themselves classes.
Now consider our rugby teams as:
The basic metaclass is Class, and this allows us to
define classes which are instances of other classes, and (thus)
inherit properties from this class.
Inheritance of default values occurs when one element or class is an instance of a class.
Slots as Objects
How can we to represent the following properties in frames?
Attributes such as weight, age be attached and make sense.
Constraints on values such as age being less than a hundred
Default values
Rules for inheritance of values such as children inheriting parent's names
Rules for computing values
Many values for a slot.
A slot is a relation that maps from its domain of classes to its range of values.
A relation is a set of ordered pairs so one relation is a subset of another.
Since slot is a set the set of all slots can be represent by a metaclass called Slot, say.
Range is split into two parts one the class of the elements and the other is a constraint
which is a logical expression if absent it is taken to be true.
If there is a value for default then it must be passed on unless an instance has its own
value.
The to-compute attribute involves a procedure to compute its
value. E.g. in Position where we use the dot notation to assign values to the slot of a
frame.
Transfers through lists other slots from which values can be derived from inheritance.
Individual semantic networks and frame systems may have specialized links and inference
procedures, but no hard and fast rules about what kinds of objects and links are good in general
for knowledge representation.
It is the theory of how to represent the kind of knowledge about events that is usually contained
in natural language sentences. The goal is to represent the knowledge in a way that
Facilitates drawing inferences from the sentences.
Is independent of the language in which the sentences are originally stated.
CD provides a structure into which nodes representing information can be placed a specific set of
primitives at a given level of granularity.
A second set of CD building blocks is the set of allowable dependencies among the
conceptualizations described in a sentence. There are 4 primitive conceptual categories from
which dependency structures can be built.
describes the relationship between two PPs, one of which belongs to the set
defined by the other.
describes the relationship between a PP and an attribute that has already been
predicated of it. The direction of the arrow is toward the PP being described.
describes the relationship between two PPs, one of which provides a particular
kind of information about the other. The three most common types of information to be
provided in this way are
o possession (shown as POSS-BY),
o location (shown as LOC) and
o physical containment (shown as CONT).
The direction of the arrow is again toward the concept being described.
describes the relationship between an ACT and the PP that is the object of that
ACT. The direction of the arrow is toward the ACT since the context of the specific ACT
determines the meaning of the object relation.
describes the relationship between an ACT and the source and the recipient of the
ACT.
describes the relationship between an ACT and the instrument with which it is
performed. The instrument must always be a full conceptualization (i.e., it must contain
an ACT), not just a single physical object.
describes the relationship between an ACT and its physical source and destination.
represents the relationship between a PP and a state in which it started and
another in which it ended.
describes the relationship between one conceptualization and another that causes
it. Notice that the arrows indicate dependency of one conceptualization on another and
so point in the opposite direction of the implication arrows. The two forms of the rule
describe the cause of an action and the cause of a state change.
describes the relationship between a conceptualization and the time at which the
event it describes occurred.
describes the relationship between one conceptualization and another that is the
time of the first. The example for this rule also shows how CD exploits a model of the
human information processing system.
describes relationship between a conceptualization and the place at which it
occurred.
( ) -- model
natural language understanding.
( ) -- Scripts to understand stories. See next section.
( ) -- Scripts to understand stories.
A script is a structure that prescribes a set of circumstances which could be expected to follow on
from one another. It is similar to a thought sequence or a chain of situations which could be
anticipated. It could be considered to consist of a number of slots or frames but with more
specialized roles.
Scripts provide an ability for default reasoning when information is not available that directly
states that an action occurred. So we may assume, unless otherwise stated, that a diner at a
restaurant was served food, that the diner paid for the food, and that the dinner was served by a
waiter/waitress.
Scripts are useful in describing certain situations such as robbing a bank. This might involve:
Getting a gun.
Hold up a bank.
The script does not contain typical actions although there are options such as whether the
customer was pleased or not. There are multiple paths through the scenes to make for a robust
script what would a “going to the movies” script look like? Would it have similar props, actors,
scenes? How about “going to class”?
CYC is a very large knowledge base project aimed at capturing human commonsense knowledge.
The goal of CYC is to encode the large body of knowledge that is so obvious that it is easy to
forget to state it explicitly. Such a knowledge base could then be combined with specialized
knowledge bases to produce systems that are less brittle than most of the ones available today.
Building an immense knowledge base is a staggering task. There are two possibilities for acquiring
this knowledge automatically:
1. In order for a system to learn a great deal, it must already know a
great deal. In particular, systems with a lot of knowledge will be able to employ
powerful analogical reasoning.
2. Humans extend their own knowledge by reading
books and talking with other humans.
By hand.
Special CYCL language:
LISP like.
Frame based
Multiple inheritance
Slots are fully fledged objects.
Generalized inheritance -- any link not just and .
The CD representation of the information contained in the sentence is shown above. It says that
Bill informed John that he (Bill) will do something to break John’s nose. Bill did this so that John
will believe that if he (John) does something (different from what Bill will do to break his nose),
then Bill will break John’s nose. In this representation, the word believe has been used to simplify
the example. But the idea behind believe can be represented in CD as MTRANS of a fact into
John’s memory. The actions do1 and do2 are dummy placeholders that refer to some as yet
unspecified actions.
CYCL
CYCs knowledge is encoded in a representation language called CYCL.
CYCL is a frame based system that incorporates most of the techniques.
Generalizes the notion of inheritance so that properties can be inherited
along any link, not just isa and instance.
CYCL contains a constraint language that allows the expression of arbitrary
first-order logical expressions.
11. Learning
What is Learning?
Learning is an important area in AI, perhaps more so than planning.
Problems are hard -- harder than planning.
Recognised Solutions are not as common as planning.
A goal of AI is to enable computers that can be taught rather than programmed.
Why is it hard?
Intelligence implies that an organism or machine must be able to adapt to new situations.
It must be able to learn to do new things.
This requires knowledge acquisition, inference, updating/refinement of knowledge base,
acquisition of heuristics, applying faster searches, etc.
Rote Learning
Rote Learning is basically memorisation.
Saving knowledge so it can be used again.
Retrieval is the only problem.
No repeated computation, inference or query is necessary.
Samuel's Checkers program employed rote learning (it also used parameter adjustment which will
be discussed shortly).
A minimax search was used to explore the game tree.
Time constraints do not permit complete searches.
It records board positions and scores at search ends.
Now if the same board position arises later in the game the stored value can be recalled
and the end effect is that more deeper searched have occurred.
Rote learning is basically a simple process. However it does illustrate some issues that are relevant
to more complex learning issues.
Organisation
o access of the stored value must be faster than it would be to recompute it.
Methods such as hashing, indexing and sorting can be employed to enable this.
o E.g Samuel's program indexed board positions by noting the number of pieces.
Generalisation
o The number of potentially stored objects can be very large. We may need to
generalise some information to make the problem manageable.
o E.g Samuel's program stored game positions only for white to move. Also
rotations along diagonals are combined.
Stability of the Environment
o Rote learning is not very effective in a rapidly changing environment. If the
environment does change then we must detect and record exactly what has
changed -- the frame problem.
Store v Compute
o When knowledge is added to the knowledge base care must be taken so that bad
side-effects are avoided.
o E.g. Introduction of redundancy and contradictions.
Evaluate
o The system must assess the new knowledge for errors, contradictions etc.
Many programs rely on an evaluation procedure to summarize the state of search etc. Game
playing programs provide many examples of this.
However, many programs have a static evaluation function.
In learning a slight modification of the formulation of the evaluation of the problem is required.
Here the problem has an evaluation function that is represented as a polynomial of the form such
as:
For example: Making dinner can be described a lay the table, cook dinner, serve dinner. We
could treat laying the table as on action even though it involves a sequence of actions.
Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true.
STRIPS can achieve ON(A,B) in four steps:
UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B)
STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE),
postconditions ON(A,B), ON(C,TABLE) and the four steps as its body.
MACROP can now be used in future operation.
But it is not very general. The above can be easily generalised with variables used in place of the
blocks.
However generalisation is not always that easy (See Rich and Knight).
Learning by Chunking
Chunking involves similar ideas to Macro Operators and originates from psychological ideas on
memory and problem solving.
The computational basis is in production systems (studied earlier).
SOAR is a system that use production rules to represent its knowledge. It also employs chunking
to learn from experience.
Inductive Learning
This involves the process of learning by example -- where a system tries to induce a general rule
from a set of observed instances.
This involves classification -- assigning, to a particular input, the name of a class to which it
belongs. Classification is important to many problem solving tasks.
A learning system has to be capable of evolving its own class descriptions:
Initial class definitions may not be adequate.
The world may not be well understood or rapidly changing.
The task of constructing class definitions is called induction or concept learning
1. Select one know instance of the concept. Call this the concept definition.
2. Examine definitions of other known instance of the concept. Generalise the definition to
include them.
3. Examine descriptions of near misses. Restrict the definition to exclude these.
Both steps 2 and 3 rely on comparison and both similarities and differences need to be identified.
Version Spaces
Let the be the first card in the sample set. We are told that this is odd black.
So the most specific concept is alone the least is still all our cards.
Next card : we need to modify our most specific concept to indicate the generalisation
of the set something like ``odd and black cards''. Least remains unchanged.
Next card : Now we can modify the least specific set to exclude the . As more
exclusion are added we will generalise this to all black cards and all odd cards.
NOTE that negative instances cause least specific concepts to become more specific and
positive instances similarly affect the most specific.
If the two sets become the same set then the result is guaranteed and the target concept is
met.
S is unaffected.
Decision Trees
Quinlan in his ID3 system (986) introduced the idea of decision trees.
ID3 is a program that can build trees automatically from given positive and negative instances.
Basically each leaf of a decision tree asserts a positive or negative concept. To classify a particular
input we start at the top and follow assertions down until we reach an answer (Fig 28)
Generalisation
the explanation is generalized as far possible while still describing the goal
concept.
EBL example
Goal: To get to Brecon -- a picturesque welsh market town famous for its mountains (beacons)
and its Jazz festival.
The training data is:
near(Cardiff, Brecon),
airport(Cardiff)
In this case operational criterion is: We must express concept definition in pure description
language syntax.
If we analyse the proof (say with an ATMS). We can learn a few general rules from it.
Since Brecon appears in the database when we abstract things we must explicitly record the use
of the fact:
near(Cardiff,x) holds(loc(x),drive(Cardiff,x), result(fly(Cardiff), s')))
This states if x is near Cardiff we can get to it by flying to Cardiff and then driving. We
have learnt this general rule.
This states we can get top Brecon by flying to another nearby airport and driving from there.
We could add airport(Swansea) and get an alternative means of travel plan.
Finally we could actually abstract out both Brecon and Cardiff to get a general plan:
near(x,y) airport(y) holds(loc(y), result(drive(x,y),result(fly(x),s')))
Discovery
Discovery is a restricted form of learning in which one entity acquires knowledge without the
help of a teacher.
o 250 heuristics represent hints about activities that might lead to interesting
discoveries.
o How to employ functions, create new concepts, generalisation etc.
Hypothesis and test based search.
Agenda control of discovery process.
Analogy
Analogy involves a complicated mapping between what might appear to be two dissimilar
concepts.
Bill is built like a large outdoor brick lavatory.
He was like putty in her hands
Humans quickly recognise the abstractions involved and understand the meaning.
There are two methods of analogical problem methods studied in AI.
Transformational Analogy
Look for a similar solution and copy it to the new situation making suitable substitutions where
appropriate.
E.g. Geometry.
If you know about lengths of line segments and a proof that certain lines are equal (Fig. 29) then
we can make similar assertions about angles.
Derivational Analogy
Transformational analogy does not look at how the problem was solved -- it only looks at the
final solution.
The history of the problem solution - the steps involved - are often relevant.
Carbonell (1986) showed that derivational analogy is a necessary component in the transfer of
skills in complex domains:
In translating Pascal code to LISP -- line by line translation is no use. You will have
to reuse the major structural and control decisions.
One way to do this is to replay a previous derivation and modify it when necessary.
If initial steps and assumptions are still valid copy them across.
Otherwise alternatives need to found -- best first search fashion.
Reasoning by analogy becomes a search in T-space -- means-end analysis.
Knowledge Base
It contains domain-specific and high-quality knowledge. Knowledge is required to exhibit
intelligence. The success of any ES majorly depends upon the collection of highly accurate
and precise knowledge.
What is Knowledge?
o The data is collection of facts. The information is organized as data and facts
about the task domain. Data, information, and past experience combined together
are termed as knowledge.
Components of Knowledge Base
o The knowledge base of an ES is a store of both, factual and heuristic knowledge.
Factual Knowledge − It is the information widely accepted by the
Knowledge Engineers and scholars in the task domain.
Heuristic Knowledge − It is about practice, accurate judgment, one’s
ability of evaluation, and guessing.
Knowledge representation
o It is the method used to organize and formalize the knowledge in the knowledge
base. It is in the form of IF-THEN-ELSE rules.
Knowledge Acquisition
o The success of any expert system majorly depends on the quality, completeness,
and accuracy of the information stored in the knowledge base.
o The knowledge base is formed by readings from various experts, scholars, and
the Knowledge Engineers. The knowledge engineer is a person with the qualities
of empathy, quick learning, and case analyzing skills.
o He acquires information from subject expert by recording, interviewing, and
observing him at work, etc.
o He then categorizes and organizes the information in a meaningful way, in the
form of IF-THEN-ELSE rules, to be used by interference machine. The knowledge
engineer also monitors the development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a
correct, flawless solution.
In case of knowledge-based ES, the Inference Engine acquires and manipulates the
knowledge from the knowledge base to arrive at a particular solution.
In case of rule based ES, it −
o Applies rules repeatedly to the facts, which are obtained from earlier rule
application.
o Adds new knowledge into the knowledge base if required.
o Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies −
o Forward Chaining
o Backward Chaining
User Interface
User interface provides interaction between user of the ES and the ES itself.
It is generally Natural Language Processing so as to be used by the user who is well-versed
in the task domain.
The user of the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may
appear in the following forms −
o Natural language displayed on screen.
o Verbal narrations in natural language.
o Listing of rule numbers displayed on the screen.
o The user interface makes it easy to trace the credibility of the deductions.
MYCIN is one example of an expert system rule. All the rules we show are English versions of
the actual rules that the systems use.
RI (sometimes are called XCON) is a program that configures DEC VAX systems. Its rules
look like this:
Notice that RI’s rules, unlike MYCIN’s, contain no numeric measures of certainty. In the
task domain with which RI deals, it is possible to state exactly the correct thing to be
done in each particular set of circumstances. One reason for this is that there exists a good
deal of human expertise in this area. Another is that since RI is doing a design task, it is
not necessary to consider all possible alternatives; one good one is enough. As a result,
probabilistic information is not necessary in RI.
PROSPECTOR is a program that provides advice on mineral exploration. Its rules look
like this:
In PROSPECTOR, each rule contains two confidence estimates. The first indicates the
extent to which the presence of the evidence described in the condition part of the rule
suggests the validity of the rule’s conclusion. In the PROSPECTOR rule shown above, the
number 2 indicates that the presence of the evidence is mildly encouraging. The second
confidence estimate measures the extent to which the evidence is necessary to the validity
of the conclusion or stated another way, the extent to which the lack of the evidence
indicates that the conclusion is not valid.
DESIGN ADVISOR is a system that critiques chip designs. Its rules look like:
This gives advice to a chip designer, who can accept or reject the advice. If the advice is
rejected,, the system can exploit a justification-based truth maintenance system to revise
its model of the circuit. The first rule shown here says that an element should be criticized
for poor resetability if the sequential level count is greater than two, unless its signal is
currently believed to be resettable.
There are now several commercially available shells that serve as the basis for many of the expert
systems currently being built. These shells provide much greater flexibility in representing
knowledge and in reasoning with it than MYCIN did. They typically support rules, frames, truth
maintenance systems, and a variety of other reasoning mechanisms.
Early expert systems shells provided mechanisms for knowledge representation, reasoning and
explanation. But as experience with using these systems to solve real world problem grew, it
became clear that expert system shells needed to do something else as well. They needed to
make it easy to integrate expert systems with other kinds of programs.
EXPLANATION
In order for an expert system to be an effective tool, people must be able to interact with it
easily. To facilitate this interaction, the expert system must have the following two capabilities in
addition to the ability to perform its underlying task:
Explain its reasoning:
o In many of the domains in which expert systems operate, people will not accept
results unless they have been convinced of the accuracy of the reasoning process
that produced those results. This is particularly true, for example, in medicine,
where a doctor must accept ultimate responsibility for a diagnosis, even if that
diagnosis was arrived at with considerable help from a program.
Acquire new knowledge and modifications of old knowledge:
o Since expert systems derive their power from the richness of the knowledge bases
they exploit, it is extremely important that those knowledge bases be as complete
and as accurate as possible. One way to get this knowledge into a program is
through interaction with the human expert. Another way is to have the program
learn expert behavior from raw data.
KNOWLEDGE ACQUISITION
How are expert systems built? Typically, a knowledge engineer interviews a domain expert to
elucidate expert knowledge, when is then translated into rules. After the initial system is built, it
must be iteratively refined until it approximates expert-level performance. This process is
expensive and time-consuming, so it is worthwhile to look for more automatic ways of
constructing expert knowledge bases.
While no totally automatic knowledge acquisition systems yet exist, there are many programs
that interact with domain experts to extract expert knowledge efficiently. These programs
provide support for the following activities:
Entering knowledge
Maintaining knowledge base consistency
Ensuring knowledge base completeness
The most useful knowledge acquisition programs are those that are restricted to a particular
problem-solving paradigm e.g. diagnosis or design. It is important to be able to enumerate the
roles that knowledge can play in the problem-solving process. For example, if the paradigm is
diagnosis, then the program can structure its knowledge base around symptoms, hypotheses and
causes. It can identify symptoms for which the expert has not yet provided causes.
Since one symptom may have multiple causes, the program can ask for knowledge about how to
decide when one hypothesis is better than another. If we move to another type of problem-
solving, say profitably interacting with an expert.
MOLE (Knowledge Acquisition System)
It is a system for heuristic classification problems, such as diagnosing diseases. In particular, it is
used in conjunction with the cover-and-differentiate problem-solving method. An expert system
produced by MOLE accepts input data, comes up with a set of candidate explanations or
classifications that cover (or explain) the data, then uses differentiating knowledge to determine
which one is best. The process is iterative, since explanations must themselves be justified, until
ultimate causes the ascertained.
MOLE interacts with a domain expert to produce a knowledge base that a system called MOLE-p
(for MOLE-performance) uses to solve problems. The acquisition proceeds through several steps:
1. Initial Knowledge base construction.
MOLE asks the expert to list common symptoms or complaints that might require
diagnosis. For each symptom, MOLE prompts for a list of possible explanations.
MOLE then iteratively seeks out higher-level explanations until it comes up with a
set of ultimate causes. During this process, MOLE builds an influence network
similar to the belief networks.
The expert provides covering knowledge, that is, the knowledge that a
hypothesized event might be the cause of a certain symptom.
2. Refinement of the knowledge base.
MOLE now tries to identify the weaknesses of the knowledge base. One approach
is to find holes and prompt the expert to fill them. It is difficult, in general, to
know whether a knowledge base is complete, so instead MOLE lets the expert
watch MOLE-p solving sample problems. Whenever MOLE-p makes an incorrect
diagnosis, the expert adds new knowledge. There are several ways in which
MOLE-p can reach the wrong conclusion. It may incorrectly reject a hypothesis
because it does not feel that the hypothesis is needed to explain any symptom.
MOLE has been used to build systems that diagnose problems with car engines, problems in steel-
rolling mills, and inefficiencies in coal-burning power plants. For MOLE to be applicable,
however, it must be possible to preenumerate solutions or classifications. It must also be practical
to encode the knowledge in terms of covering and differentiating.
One problem-solving method useful for design tasks is called propose-and-revise. Propose-and-
revise systems build up solutions incrementally. First, the system proposes an extension to the
current design. Then it checks whether the extension violates any global or local constraints.
Constraints violations are then fixed, and the process repeats.
SALT Program
The SALT program provides mechanisms for elucidating this knowledge from the expert. Like
MOLE, SALT builds a dependency network as it converses with an expert. Each node stands for a
value of a parameter that must be acquired or generated. There are three kinds of links:
Contributes-to: Associated with the first type of link are procedures that allow SALT to
generate a value for one parameter based on the value of another.
Constraints: Rules out certain parameter values.
Suggests-revision-of: points of ways in which a constraint violation can be fixed.
3. Constraining links should have suggests-revision-of links associated with them. These
include constraints links that are created when dependency loops are broken.
Control Knowledge is also important. It is critical that the system propose extensions and
revisions that lead toward a design solution. SALT allows the expert to rate revisions in terms of
how much trouble they tend to produce.
SALT compiles its dependency network into a set of production rules. As with MOLE, an expert
can watch the production system solve problems and can override the system’s decision. At the
point, the knowledge base can be changes or the override can be logged for future inspection.