NATURAL LANGUAGE PROCESSING Unit 1 Till Good Grammar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

NATURAL LANGUAGE PROCESSING

Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers the
ability to understand text and spoken words in much the same way human beings can.
Humans communicate through some form of language either by text or speech.
• To make interactions between computers and humans, computers need to understand natural
languages used by humans.
• Natural language processing is all about making computers learn, understand, analyse,
manipulate, and interpret natural(human) languages.
• NLP stands for Natural Language Processing, which is a part of Computer Science, Human
language, and Artificial Intelligence.
• Processing of Natural Language is required when you want an intelligent system like a robot to
perform as per your instructions, when you want to hear decision from a dialogue-based clinical
expert system, etc.
• The ability of machines to interpret human language is now at the core of many applications
that we use every day chatbots, Email classification and spam filters, search engines, grammar
checkers, voice assistants, and social language translators.
• The input and output of an NLP system can be Speech or Written Text
Applications of Natural Language Processing:
Natural language research encompasses various applications, raising questions about computer
systems' understanding of language.
Text-Based Applications and Understanding:
• Text-based applications encompass diverse tasks like document retrieval, information
extraction, language translation, and text summarization.
• Basic matching techniques may face challenges in handling nuanced language expressions
and complex retrieval tasks.
• Increasing demand for sophisticated understanding systems capable of comprehensively
interpreting linguistic intricacies.
• The computation of representations is crucial for effective handling of complex queries in
natural language processing.
• Story understanding tasks, resembling reading comprehension tests, provide valuable
evaluation opportunities.
• Evaluation allows for an in-depth assessment of a system's proficiency in processing
narrative contexts.
• The significance of linguistic analysis is highlighted in ensuring the efficacy of natural
language processing in various text-based applications.
Dialogue-Based Applications
• Dialogue-based applications involve various forms of interaction, including spoken
language interaction, question-answering, automated customer service, tutoring, and
problem-solving systems.
• The dynamics of dialogue systems necessitate active participation, acknowledgment
mechanisms, clarification sub-dialogues, and the maintenance of natural, smooth-flowing
conversations.
• Speech recognition focuses on identifying individual words from speech signals, yet true
understanding involves feeding this input to a natural language understanding system for
more profound comprehension.
• In spoken language interaction, recognizing the difference between speech recognition and
full understanding is crucial for effective system performance.
• Question-answering systems leverage natural language to query databases, requiring robust
understanding for accurate and relevant responses.
• Automated customer service applications aim to perform tasks like banking transactions or
order processing through natural language interactions, demanding nuanced comprehension
abilities.
• Tutoring systems, such as automated mathematics tutoring, rely on effective interaction
with students, necessitating in-depth understanding to provide tailored assistance.
• Problem-solving systems engaged in cooperative planning and scheduling require natural
language understanding for efficient collaboration and task execution.
• Recognition of the distinct challenges in spoken language interaction, such as the need for
clarification and acknowledgment, is vital for designing effective dialogue-based
applications.
Evaluation in Natural Language Understanding:
• Examines the variations in how understanding is measured across different applications.
• Recognizes the challenge of establishing criteria for determining the effectiveness of
natural language understanding systems.
Black Box Evaluation: Testing Overall System Performance:
• Black box evaluation stands out as the primary method employed to comprehensively
assess the overall performance of a system, requiring the execution of tasks without
delving into the intricacies of its internal workings.
Limitations of Early Black Box Evaluation:
• A cautionary stance is adopted to highlight potential misinterpretations that may arise
during the initial phases of black box evaluation, emphasizing that short-term success
does not guarantee sustained effectiveness over the long run.
Glass Box Evaluation for Component Analysis
• The focus shifts to glass box evaluation, a method centered around deconstructing the
system into its individual subcomponents, aiming to meticulously analyze the internal
structure and evaluate the functionality of each discrete part.
Challenges of Glass Box Evaluation:
• Acknowledging the challenge, glass box evaluation encounters difficulties because there's
no agreement on how to define the different parts of the system. This lack of consensus
makes it hard to have a clear and unified understanding in the field.
• The significance of consensus in defining components is underscored, ensuring that
assessments of system performance are not only reliable but also consistently applied.
Case Study: ELIZA Program - Mid-1960s AI Example:
• Offers a tangible example of conversational AI from MIT in the mid-
1960s.
• Demonstrates an early attempt at simulating intelligent conversation.
• Utilized a database containing keywords, patterns, and output
specifications.
• Implemented an algorithm that identified keywords, selected responses,
and generated coherent replies.
Significance of Evaluation Methods:
• Recognizes the pivotal role of evaluation methods in appraising natural language
understanding systems.
• Emphasizes the complementary insights provided by both black box and glass box
evaluations into the overall performance of a system.
Different Levels of Language Analysis:
• A language system should know a lot about words, how they create sentences, and the
rules for making sentences make sense.
• Understanding how words come together to form sentences is important for effective
communication.
• The system needs to understand what each word means and how they work together in
different situations.
• Putting together the meanings of individual words helps the system grasp the overall
meaning of a sentence.
• Apart from just language rules, the system needs to know about various topics and facts.
• Using reasoning abilities is important for tasks like concluding, answering questions, and
having logical conversations.
• To fully understand language, the system needs to blend knowledge of language rules,
general information, and good reasoning skills.
• Being part of conversations effectively means recognizing context and adjusting language
understanding accordingly.
Knowledge for Language Understanding:
- Phonetic and Phonological Knowledge:
- It involves understanding how words are related to the sounds that represent them, especially
crucial for systems using speech.
- Morphological Knowledge:
- This type of knowledge deals with understanding how words are constructed from more
basic meaning units called morphemes. For example, the word "friendly" derives its meaning
from the noun "friend" and the suffix "-ly," which turns a noun into an adjective.
- Syntactic Knowledge:
- This knowledge focuses on understanding how words can be assembled to create
grammatically correct sentences. It determines the structural role of each word in a sentence and
identifies subparts of phrases.
- Semantic Knowledge:
- Semantic knowledge revolves around understanding the meanings of words and how these
meanings combine in sentences to form overall sentence meanings. It explores context-
independent meanings, irrespective of the context in which the sentence is used.
- Pragmatic Knowledge:
- Pragmatic knowledge involves understanding how sentences are used in different situations
and how usage influences the interpretation of the sentence.
- Discourse Knowledge:
- Discourse knowledge relates to understanding how the sentences immediately preceding
impact the interpretation of the next sentence. This is particularly important for interpreting
pronouns and understanding the temporal aspects of conveyed information.
- World Knowledge:
- World knowledge encompasses general information about the structure of the world that
language users need to maintain a conversation. It includes understanding each language user's
beliefs and goals. These knowledge categories are not distinct; a specific fact may involve
aspects from multiple levels, and algorithms might need to consider various levels
simultaneously.
Representations and Understanding:
• Understanding sentences and texts involves creating precise representations of their
meanings.
• This is necessary because words often have multiple meanings, or senses, which can lead
to ambiguity.
• Disambiguation becomes crucial for making accurate inferences and modeling
understanding.
• Instead of using the sentence itself as a representation, which is inadequate due to
ambiguity, a more precise language is needed.
• This precision comes from mathematics and logic, utilizing formally specified
representation languages.
• These representation languages should meet certain criteria.
• They must be both precise and unambiguous, allowing for the expression of every
distinct reading of a sentence as a distinct formula in the representation.
• Additionally, the representation should capture the intuitive structure of natural language
sentences.
• This means that structurally similar sentences should have similar representations, and
the meanings of paraphrased sentences should be closely related.
• To achieve this, formal languages are employed for expressing various levels of analysis,
including syntactic structure, context-independent word and sentence meanings, and
general world knowledge.
• These languages use simple building blocks, such as atomic symbols, and are designed to
provide clear and unambiguous representations that align with the structure and meaning
of natural language sentences.
Syntax: Representing Sentence Structure
• The syntactic structure of a sentence reveals how words relate to each other, indicating
their grouping into phrases, the modifications between words, and the central importance
of specific words.
• This structure also identifies relationships between phrases and stores information about
the sentence structure for later processing. For example, consider these sentences:
• Grammatical judgments are not the primary goal in natural language understanding, but
certain checks, like subject-verb agreement, are crucial for eliminating potential
ambiguities.
• Ill-formed sentences, like "John are in the corner" or "John put the book," highlight issues
such as number agreement and missing modifiers.
• Most syntactic representations rely on context-free grammars, depicting sentence
structure through tree diagrams.
first structure involves a noun phrase (NP) describing a type of fly, "rice flies," and a verb phrase
(VP) stating their liking for sand. In the second structure, the sentence is formed from a noun
phrase describing a substance, rice, and a verb phrase stating its flying behavior, resembling sand
(when thrown). These structures provide insights into the parts of speech for each word, with
"like" serving as a verb (V) in the first reading and a preposition (P) in the second.
The Logical Form
• The structure of a sentence alone doesn't capture its meaning.
• For instance, the noun phrase (NP) "the catch" can have different meanings depending on
whether the context is a baseball game or a fishing expedition. Identifying the correct
sense of words, such as the ambiguity in "catch," is crucial.
• However, even after resolving this ambiguity, understanding the intended meaning
requires accounting for the specific situation in which the sentence is used.
• challenges falls into two categories: context-independent meaning and context-dependent
meaning. Knowledge about English, like the various senses of "catch," is context-
independent, while understanding that "the catch" refers to what Jack caught while
fishing yesterday is contextually dependent.
• The logical form represents the context-independent meaning of a sentence.
• The logical form encodes potential word senses and identifies semantic relationships
between words and phrases.
• Abstract semantic roles, such as AGENT, THEME, and TO-POSS, capture relationships
between the verb and its noun phrases (NPs).
• For example, in sentences 1 and 2, both describing a selling event, "John" is the seller
(AGENT), "the book" is the object being sold (THEME), and "Mary" is the buyer (TO-
POSS).
• Determining semantic relationships can eliminate certain word senses. In sentence, "Jack
invited Mary to the Halloween ball," the word "ball" takes on the sense of a formal dance
event, as the verb "invite" aligns with this interpretation.
• Semantic interpretation involves considering how individual word meanings combine to
create coherent sentence meanings, exploiting interconnections between word meanings
to reduce possible senses for each word.
Final Meaning Representation:
• The system requires a general knowledge representation (KR) to represent and reason
about its application domain.
• This KR serves as the language for encoding all specific knowledge related to the
application.
• Contextual interpretation aims to take a representation of a sentence's structure and its
logical form, mapping it into an expression in the KR, enabling the system to perform
tasks within the domain.
• For instance, in a question-answering application, a question might be mapped to a
database query, while in a story-understanding application, a sentence could be mapped
to expressions representing the described situation.
• The chosen final representation language is typically the first-order predicate calculus
(FOPC).
The Flow of Information:
Syntactic Processing:
• The parser, using word knowledge and grammar rules, organizes sentences into
structures, making it more efficient by combining syntax and meaning.
Eg: - "Cats are chased by dogs."
Syntactic structure: Subject (Cats) - Verb (are chased) - Agent (by dogs).
Logical form: Theme (Cats) - Action (chased) - Agent (dogs).
• These sentences illustrate how the syntactic structure and logical form change when the
active and passive voices are used.
Contextual Processing:
• Shapes sentence structures and meanings into a final representation by addressing details
like identifying objects and understanding speaker intentions.
• Uses knowledge of what's being talked about and the context of the conversation.
• Example: Deals with understanding intentions in sentences like "Can you lift that rock?"
Application Reasoning:
• If needed, the system performs tasks based on the application using the finalized
representation.
Utterance Planning:
• Plans how the response should sound or look, considering the context and grammar rules.
Realization:
• Turns the planned response into actual words, adapting for spoken or written forms.
• Example: In speech, it considers speech recognition; in text, it crafts the response.
Bidirectional Grammar for Understanding and Generation:
• Grammar rules are used for both understanding and creating sentences.
• Ideally, grammar should work in both directions, even though, in practice, it might be
adjusted for specific tasks.
GRAMMARS AND PARSING:
Grammar and Sentence Structure:
Tree Representation of Sentences:
• Sentences can be represented as trees, where the sentence is the root and its major
subparts (e.g., noun phrases, verb phrases) are branches.
• Nodes in the tree are labeled, such as NP for noun phrase, VP for verb phrase, V for verb,
etc.
• Each node represents a linguistic unit (e.g., NAME John, ART the, N cat).
• The tree structure is a graphical representation of the sentence's grammatical structure.
2. Terminology:
• Trees are a special form of graphs with labeled nodes connected by links.
• The top node is the root and nodes at the bottom are leaves.
• Links connect parent nodes to child nodes.
• Ancestor nodes are parents and a node is dominated by its ancestor nodes.
• The root node dominates all other nodes.
Grammar and Rewrite Rules:
• To construct a tree structure for a sentence, legal structures for the language are defined
by a set of rewrite rules.
1. S -> NP VP
2. VP -> V NP
3. NP -> NAME
4. NP -> ART N
5. NAME -> John
6. V -> ate
7. ART -> the
8. N -> cat

• The text provides an example set of rules (Grammar 3.2) that describe how an S
(sentence) can be expanded into noun phrases, verb phrases and other components.
• Context-free grammars (CFGs) are introduced, which consist of rules with a single
symbol (mother) on the lefthand side.
• Terminal symbols are words that cannot be further decomposed, while nonterminal
symbols (e.g., NP, VP) can be.
Derivations:
A grammar derives a sentence through a sequence of rewrite rules starting from the start
symbol (S).
The provided example demonstrates the derivation of the sentence "John ate the cat" using
Grammar 3.2.
S
=> NP VP (rewriting S)
=> NAME VP (rewriting NP)
=> John VP (rewriting NAME)
=> John V NP (rewriting VP)
=> John ate NP (rewriting V)
=> John ate ART N (rewriting NP)
=> John ate the N (rewriting ART)
=> John ate the cat (rewriting N)
Derivations are the basis for sentence generation and parsing.
Sentence Generation and Parsing:
• Sentence generation involves constructing legal sentences by randomly choosing rewrite
rules.
• Parsing identifies the structure of sentences given a grammar.
• Two basic searching methods are described: top-down (starting with S and exploring
ways to rewrite symbols) and bottom-up (starting with words and using rewrite rules
backward).
Bottom-Up Parsing Example:
=> NAME ate the cat (rewriting John)
=> NAME V the cat (rewriting ate)
=> NAME V ART cat (rewriting the)
=> NAME V ART N (rewriting cat)
=> NP V ART N (rewriting NAME)
=> NP V NP (rewriting ART N)
=> NP VP (rewriting V NP)
=> S (rewriting NP VP)
Tree Representation as Parse Tree:
A tree representation (e.g., Figure 3.1) serves as a record of CFG rules that account for the
structure of the sentence.
It reflects the parsing process, whether top-down or bottom-up.
What Makes a Good Grammar:
Generality
• A good grammar should be capable of analyzing a wide range of sentences in the target
language.
• The grammar should not be overly specific or limited to certain sentence structures. It
should accommodate the diversity of sentences found in the language.
Selectivity
• The grammar should be able to identify and flag non-sentences or structurally
problematic constructions.
• It's crucial that the grammar can distinguish between well-formed sentences and those
that violate the syntactic rules of the language.
Understandability
• The grammar rules should be clear, concise and easily interpretable by both linguists and
computational systems.
• An easily understandable grammar facilitates effective communication among linguists,
aids in the development of parsing algorithms and contributes to the overall clarity of
linguistic analyses.
Simplicity and Elegance
• A good grammar tends to be as simple and elegant as possible while capturing the
complexities of the language.
• A simple grammar is more likely to be generalizable and easier to work with.
Unnecessary complexity can hinder both human understanding and the development of
computational models.
Consistency
• Grammar rules should be internally consistent and not contradict each other.
• Consistency ensures that the grammar provides coherent analyses of sentences.
Inconsistencies can lead to ambiguity and make it challenging to derive meaningful
insights from the grammar.
Applicability Across Domains
• A good grammar should apply to various domains, genres and registers within the
language.
• Languages are used in diverse contexts and versatile grammar can handle different
linguistic styles and topics.
Tests for Constituency
• The proposed constituents (e.g., noun phrases, verb phrases) in the grammar should pass
tests like conjunction and insertion to validate their linguistic status.
• These tests ensure that the proposed grammatical structures align with how language
functions, enhancing the reliability of the grammar.
Incremental Development
• A good grammar evolves, with new rules added carefully and considering their
interactions with existing rules.
• Incremental development allows linguists to refine the grammar based on new linguistic
insights and ensures that the grammar remains coherent.
Robustness
• A good grammar should be robust and capable of handling variations, errors and
ambiguities present in natural language.
• Real-world language can be messy and a robust grammar ensures that NLP systems can
handle linguistic noise effectively.
Expressiveness
• The grammar should be expressive enough to capture complex linguistic phenomena and
nuances.
• Expressiveness is crucial for understanding subtle variations in meaning, such as
sarcasm, irony, or other figurative language

You might also like