Deep Learning for Natural Language Processing
()
About this ebook
Inside Deep Learning for Natural Language Processing you’ll find a wealth of NLP insights, including:
An overview of NLP and deep learning
One-hot text representations
Word embeddings
Models for textual similarity
Sequential NLP
Semantic role labeling
Deep memory-based NLP
Linguistic structure
Hyperparameters for deep NLP
Deep learning has advanced natural language processing to exciting new levels and powerful new applications! For the first time, computer systems can achieve "human" levels of summarizing, making connections, and other tasks that require comprehension and context. Deep Learning for Natural Language Processing reveals the groundbreaking techniques that make these innovations possible. Stephan Raaijmakers distills his extensive knowledge into useful best practices, real-world applications, and the inner workings of top NLP algorithms.
About the technology
Deep learning has transformed the field of natural language processing. Neural networks recognize not just words and phrases, but also patterns. Models infer meaning from context, and determine emotional tone. Powerful deep learning-based NLP models open up a goldmine of potential uses.
About the book
Deep Learning for Natural Language Processing teaches you how to create advanced NLP applications using Python and the Keras deep learning library. You’ll learn to use state-of the-art tools and techniques including BERT and XLNET, multitask learning, and deep memory-based NLP. Fascinating examples give you hands-on experience with a variety of real world NLP applications. Plus, the detailed code discussions show you exactly how to adapt each example to your own uses!
What's inside
Improve question answering with sequential NLP
Boost performance with linguistic multitask learning
Accurately interpret linguistic structure
Master multiple word embedding techniques
About the reader
For readers with intermediate Python skills and a general knowledge of NLP. No experience with deep learning is required.
About the author
Stephan Raaijmakers is professor of Communicative AI at Leiden University and a senior scientist at The Netherlands Organization for Applied Scientific Research (TNO).
Table of Contents
PART 1 INTRODUCTION
1 Deep learning for NLP
2 Deep learning and language: The basics
3 Text embeddings
PART 2 DEEP NLP
4 Textual similarity
5 Sequential NLP
6 Episodic memory for NLP
PART 3 ADVANCED TOPICS
7 Attention
8 Multitask learning
9 Transformers
10 Applications of Transformers: Hands-on with BERT
Stephan Raaijmakers
Stephan Raaijmakers is professor of Communicative AI at Leiden University and a senior scientist (machine learning, NLP) at TNO.
Related to Deep Learning for Natural Language Processing
Related ebooks
Natural Language Processing in Action: Understanding, analyzing, and generating text with Python Rating: 0 out of 5 stars0 ratingsProbabilistic Deep Learning: With Python, Keras and TensorFlow Probability Rating: 0 out of 5 stars0 ratingsDeep Learning with Keras Rating: 5 out of 5 stars5/5Real-World Natural Language Processing: Practical applications with deep learning Rating: 0 out of 5 stars0 ratingsDeep Learning with Structured Data Rating: 0 out of 5 stars0 ratingsTransfer Learning for Natural Language Processing Rating: 0 out of 5 stars0 ratingsPython Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Python: Natural Language Processing Using NLTK Rating: 4 out of 5 stars4/5Deep Learning for Search Rating: 0 out of 5 stars0 ratingsInteractive Applications Using Matplotlib Rating: 0 out of 5 stars0 ratingsNeural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention Rating: 0 out of 5 stars0 ratingsClassic Computer Science Problems in Python Rating: 0 out of 5 stars0 ratingsGrokking Machine Learning Rating: 0 out of 5 stars0 ratingsNumPy Essentials Rating: 0 out of 5 stars0 ratingsInterpretable AI: Building explainable machine learning systems Rating: 0 out of 5 stars0 ratingsPython Workout: 50 ten-minute exercises Rating: 0 out of 5 stars0 ratingsPython Text Processing with NLTK 2.0 Cookbook: LITE Rating: 4 out of 5 stars4/5Deep Reinforcement Learning in Action Rating: 4 out of 5 stars4/5Deep Learning with PyTorch Rating: 5 out of 5 stars5/5Deep Learning with Python Rating: 5 out of 5 stars5/5Deep Learning for Vision Systems Rating: 5 out of 5 stars5/5Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges Rating: 0 out of 5 stars0 ratingsDeep Learning with TensorFlow Rating: 5 out of 5 stars5/5Machine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5MLOps Production A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsNatural Language Processing: Python and NLTK Rating: 0 out of 5 stars0 ratingsMachine Learning with TensorFlow, Second Edition Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
2084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5Nexus: A Brief History of Information Networks from the Stone Age to AI Rating: 4 out of 5 stars4/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Summary of Super-Intelligence From Nick Bostrom Rating: 4 out of 5 stars4/5Writing AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsMidjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Build a Career in Data Science Rating: 5 out of 5 stars5/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 3 out of 5 stars3/5Coding with AI For Dummies Rating: 0 out of 5 stars0 ratingsA Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5Enterprise AI For Dummies Rating: 3 out of 5 stars3/5
Reviews for Deep Learning for Natural Language Processing
0 ratings0 reviews
Book preview
Deep Learning for Natural Language Processing - Stephan Raaijmakers
Deep Learning for Natural Language Processing
Stephan Raaijmakers
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
www.manning.com
Copyright
For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: [email protected]
©2022 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
ISBN: 9781617295447
brief contents
Part 1. Introduction
1 Deep learning for NLP
2 Deep learning and language: The basics
3 Text embeddings
Part 2. Deep NLP
4 Textual similarity
5 Sequential NLP
6 Episodic memory for NLP
Part 3. Advanced topics
7 Attention
8 Multitask learning
9 Transformers
10 Applications of Transformers: Hands-on with BERT
Bibliography
contents
Front matter
preface
acknowledgments
about this book
about the author
about the cover illustration
Part 1. Introduction
1 Deep learning for NLP
1.1 A selection of machine learning methods for NLP
The perceptron
Support vector machines
Memory-based learning
1.2 Deep learning
1.3 Vector representations of language
Representational vectors
Operational vectors
1.4 Vector sanitization
The hashing trick
Vector normalization
2 Deep learning and language: The basics
2.1 Basic architectures of deep learning
Deep multilayer perceptrons
Two basic operators: Spatial and temporal
2.2 Deep learning and NLP: A new paradigm
3 Text embeddings
3.1 Embeddings
Embedding by direct computation: Representational embeddings
Learning to embed: Procedural embeddings
3.2 From words to vectors: Word2Vec
3.3 From documents to vectors: Doc2Vec
Part 2. Deep NLP
4 Textual similarity
4.1 The problem
4.2 The data
Authorship attribution and verification data
4.3 Data representation
Segmenting documents
Word-level information
Subword-level information
4.4 Models for measuring similarity
Authorship attribution
Verifying authorship
5 Sequential NLP
5.1 Memory and language
The problem: Question Answering
5.2 Data and data processing
5.3 Question Answering with sequential models
RNNs for Question Answering
LSTMs for Question Answering
End-to-end memory networks for Question Answering
6 Episodic memory for NLP
6.1 Memory networks for sequential NLP
6.2 Data and data processing
PP-attachment data
Dutch diminutive data
Spanish part-of-speech data
6.3 Strongly supervised memory networks: Experiments and results
PP-attachment
Dutch diminutives
Spanish part-of-speech tagging
6.4 Semi-supervised memory networks
Semi-supervised memory networks: Experiments and results
Part 3. Advanced topics
7 Attention
7.1 Neural attention
7.2 Data
7.3 Static attention: MLP
7.4 Temporal attention: LSTM
7.5 Experiments
MLP
LSTM
8 Multitask learning
8.1 Introduction to multitask learning
8.2 Multitask learning
8.3 Multitask learning for consumer reviews: Yelp and Amazon
Data handling
Hard parameter sharing
Soft parameter sharing
Mixed parameter sharing
8.4 Multitask learning for Reuters topic classification
Data handling
Hard parameter sharing
Soft parameter sharing
Mixed parameter sharing
8.5 Multitask learning for part-of-speech tagging and named-entity recognition
Data handling
Hard parameter sharing
Soft parameter sharing
Mixed parameter sharing
9 Transformers
9.1 BERT up close: Transformers
9.2 Transformer encoders
Positional encoding
9.3 Transformer decoders
9.4 BERT: Masked language modeling
Training BERT
Fine-tuning BERT
Beyond BERT
10 Applications of Transformers: Hands-on with BERT
10.1 Introduction: Working with BERT in practice
10.2 A BERT layer
10.3 Training BERT on your data
10.4 Fine-tuning BERT
10.5 Inspecting BERT
Homonyms in BERT
10.6 Applying BERT
Bibliography
index
front matter
preface
Computers have been trying hard to make sense of language in recent decades. Supported by disciplines like linguistics, computer science, statistics, and machine learning, the field of computational linguistics or natural language processing (NLP) has come into full bloom, supported by numerous scientific journals, conferences, and active industry participation. Big tech companies like Google, Facebook, IBM, and Microsoft appear to have prioritized their efforts in natural language analysis and understanding, and progressively offer datasets and helpful open source software for the natural language processing community. Currently, deep learning is increasingly dominating the NLP field.
To someone who is eager to join this exciting field, the high pace at which new developments take place in the deep learning–oriented NLP community may seem daunting. There seems to be a large gap between descriptive, statistical, and more traditional machine learning approaches to NLP on the one hand, and the highly technical, procedural approach of deep learning neural networks on the other hand. This book aims to bridge this gap a bit, through a gentle introduction to deep learning for NLP. It targets students, linguists, computer scientists, practitioners, and all other people interested in artificial intelligence. Let’s refer to these groups of people as NLP engineers. When I was a student, lacking a systematic computational linguistics program in those days, I pretty much pieced together a personal—and necessarily incomplete—NLP curriculum. It was a tough job. My motivation for writing this book has been to make this journey a bit easier for aspiring NLP engineers, and to give you a head start by introducing you to the fundamentals of deep learning–based NLP.
I sincerely believe that to become an NLP engineer with the ambition to produce innovative solutions, you need to possess advanced software development and machine learning skills. You need to fiddle with algorithms and come up with new variants yourself. Much like the 17th-century Dutch scientist Antonie van Leeuwenhoek, who designed and produced his own microscopes for experimentation, the modern-day NLP engineer creates their own digital instruments for studying and analyzing language. Whenever an NLP engineer succeeds in building a model of natural language that adheres to the facts,
that is, is observationally adequate, not only industrial (that is, practical) but also scientific progress has been made. I invite you to adopt this mindset, to continuously observe how humans process language, and to contribute to the wonderful field of NLP, where, in spite of algorithmic progress, so many topics are still open!
acknowledgments
I wish to thank my employer, TNO (the Netherlands Organisation for Applied Scientific Research) for supporting the realization of this book. My thanks go to students from the faculties of Humanities and Science from Leiden University and assorted readers of the book for your feedback on the various MEAP versions, including correcting typos and other errors. I would also like to thank the Manning staff—in particular, development editor Dustin Archibald, production editor Keri Hales, and proofreader Katie Tennant, for their enduring support, encouragement and, above all, patience.
At my request, Manning transfers all author fees to UNICEF. Through your purchase of this book, you contribute to a better future for children in need, and that need is even more acute in 2022. UNICEF is committed to ensuring special protection for the most disadvantaged children—victims of war, disasters, extreme poverty, all forms of violence and exploitation, and those with disabilities
(www.unicef.org/about-us/mission-statement). Many thanks for your help.
To all the reviewers: Alejandro Alcalde Barros, Amlan Chatterjee, Chetan Mehra, Deborah Mesquita, Eremey Vladimirovich Valetov, Erik Sapper, Giuliano Araujo Bertoti, Grzegorz Mika, Harald Kuhn, Jagan Mohan, Jorge Ezequiel Bo, Kelum Senanayake, Ken W. Alger, Kim Falk Jørgensen, Manish Jain, Mike F. Cuddy, Mortaza Doulaty, Ninoslav Čerkez, Philippe Van Bergen, Prabhuti Prakash, Ritwik Dubey, Rohit Agarwal, Shashank Polasa Venkata, Sowmya Vajjala, Thomas Peklak, Vamsi Sistla, and Vlad Navitski, thank you—your suggestions helped make this a better book.
about this book
This book will give you a thorough introduction to deep learning applied to a variety of language analysis tasks, supported by actual hands-on code. Explicitly linking the evergreens of computational linguistics (such as part-of-speech tagging, textual similarity, topic labeling, and Question Answering) to deep learning will help you become a proficient deep learning, natural language processing (NLP) expert. Beyond this, the book covers state-of-the-art approaches to challenging new problems.
Who should read this book
The intended audience for this book is anyone working in NLP: computational linguists, software engineers, and students. The field of machine learning–based NLP is vast and comprises a daunting number of formalisms and approaches. With deep learning entering the stage, many are eager to get their feet wet but may shy away from the highly technical nature of deep learning and the fast pace of this field—new approaches, software, and papers emerge on a daily basis. This book will bring you up to speed.
This book is not for those who wish to become proficient in deep learning in a general manner, readers in need of an introduction to NLP, or anyone desiring to master Keras, the deep learning Python library we use. Manning offers two books that fill these gaps and can be read as companions to this book: Natural Language Processing in Action (Hobson Lane, Cole Howard, and Hannes Hapke, 2019; www.manning.com/books/natural-language-processing-in-action) and Deep Learning with Python (François Chollet, 2021: www.manning.com/books/deep-learning-with-python-second-edition). If you want a quick and thorough introduction to Keras, visit https://keras.io/getting_started/intro_to_keras_for_engineers.
How this book is organized: A road map
Part 1, consisting of chapters 1, 2, and 3, introduces the history of deep learning, the basic architectures of deep learning for NLP and their implementation in Keras, and how to represent text for deep learning using embeddings and popular embedding strategies.
Part 2, consisting of chapters 4, 5, and 6, focuses on assessing textual similarity with deep learning, processing long sequences with memory-equipped models for Question Answering, and then applying such memory models to other NLP.
Part 3, consisting of chapters 7, 8, 9, and 10, starts by introducing neural attention, then moves on to the concept of multitask learning, using Transformers, and finally getting hands-on with BERT and inspecting the embeddings it produces.
About the code
The code we develop in this book is somewhat generic. Keras is a dynamic library, and while I was writing the book, some things changed, including the now-exclusive dependency of Keras on TensorFlow as a backend (a Keras backend is low-level code for performing efficient neural network computations). The changes are limited, but occasionally you may need to adapt the syntax of your code if you're using the latest Keras version (version 2.0 and above).
In the book, we draw pragmatic inspiration from public domain, open source code and reuse code snippets that are handy. Specific sources include the following:
The Keras source code base, which contains many examples addressing NLP
The code accompanying the companion book Deep Learning with Python
Popular and excellent open source websites like https://adventuresinmachinelearning.com and https://machinelearningmastery.com
Blogs like http://karpathy.github.io
Coder communities like Stack Overflow
The emphasis of the book is more on outlining algorithms and code and less on achieving academic state-of-the-art results. However, starting from the basic solutions and approaches outlined throughout the book, and backed up by the many practical code examples, you will be empowered to reach better results.
This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text.
In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In some cases, even this was not enough, and listings include line-continuation markers (➥). Code annotations accompany many of the listings, highlighting important concepts.
You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/deep-learning-for-natural-language-processing. The complete code for the examples in the book is available for download from the Manning website at https://www.manning.com/books/deep-learning-for-natural-language-processing, and from GitHub at https://github.com/stephanraaijmakers/deeplearningfornlp.
liveBook discussion forum
Purchase of Deep Learning for Natural Language Processing includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/deep-learning-for-natural-language-processing/discussion. You can also learn more about Manning's forums and the rules of conduct at https://livebook.manning.com/discussion.
Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking him some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
about the author
Stephan Raaijmakers
received his education as a computational linguist at Leiden University, the Netherlands. He obtained his PhD on machine learning–based NLP from Tilburg University. He has been working since 2000 at TNO, the Netherlands Organisation for Applied Scientific Research, an independent organization founded by law in 1932, aimed at enabling business and government to apply scientific knowledge, contributing to industrial innovation and societal welfare. Within TNO, he has worked on many machine learning–intensive projects dealing with language. Stephan is also a professor of communicative AI at Leiden University (LUCL, Leiden University Centre for Linguistics). His chair focuses on deep learning–based approaches to human-machine dialogue.
about the cover illustration
The figure on the cover of Deep Learning for Natural Language Processing, titled Paisan de dalecarlie,
or Peasant, Dalecarlia,
is from an image held by the New York Public Library in the Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Picture Collection. Each illustration is finely drawn and colored by hand.
In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.
Part 1. Introduction
Part 1 introduces the history of deep learning, relating it to other forms of machine learning–based natural language processing (NLP; chapter 1). Chapter 2 discusses the basic architectures of deep learning for NLP and their implementation in Keras. Chapter 3 discusses how to represent text for deep learning using embeddings and focuses on Word2Vec and Doc2Vec, two popular embedding strategies.
1 Deep learning for NLP
This chapter covers
Taking a short road trip through machine learning applied to NLP
Learning about the historical roots of deep learning
Introducing vector-based representations of language
Language comes naturally to humans but has historically been hard for computers to grasp. This book addresses the application of recent, cutting-edge deep learning techniques to automated language analysis. In the last decade, deep learning has emerged as the vehicle of the latest wave in artificial intelligence (AI). Results have consistently redefined the state of the art for a plethora of data analysis tasks in a variety of domains. For an increasing number of deep learning algorithms, better- than-human (human-parity or superhuman) performance has been reported: for instance, speech recognition in noisy conditions and medical diagnosis based on images. Current deep learning–based natural language processing (NLP) outperforms all pre-existing approaches by a large margin. What exactly makes deep learning so suitable for these intricate analysis tasks, in particular language processing? This chapter presents some of the background necessary to answer this question and guides you through a selection of important topics in machine learning for NLP.
We first examine a few main approaches to machine learning: the neural perceptron, support vector machines, and memory-based learning. After that, we look at historical developments leading to deep learning and address vector representations: encoding data (notably, textual) with numerical representations suitable for processing by neural networks.
Let’s start by discussing a few well-known machine learning–based NLP algorithms in some detail, illustrated with a handful of practical examples to whet your appetite. After that, we present the case for deep learning–based NLP.
1.1 A selection of machine learning methods for NLP
Let’s start with a quick (and necessarily incomplete) tour of machine learning–based NLP (see figure 1.1). Current natural language processing heavily relies on machine learning. Machine learning has its roots in statistics, building among others on the seminal work by Thomas Bayes and Pierre-Simon Laplace in the 18th and 19th centuries and the least-squares methods for curve approximation by Legendre in 1812. The field of neural computing started with the work of McCulloch and Pitts in 1943, who put forward a formal theory (and logical calculus) of neural networks. It would take until 1950 before learning machines were proposed by Alan Turing.
Figure 1.1 Machine learning for NLP. A first look at neural machine learning, plus background on support vector machines and memory-based learning.
All machine learning algorithms that perform classification (labeling) share a single goal: to arrive at linear separability of data that is labeled with classes: labels that indicate a (usually exclusive) category to which a data point belongs. Data points presented to a machine learning algorithm typically consist of vector representations of descriptive traits. These representations constitute a so-called input space. The subsequent processing, manipulation, and abstraction of the input space during the learning stage of a self-learning algorithm yields a feature space. Some of this processing can be done external to the algorithm: raw data can be converted to features as part of a preprocessing stage, which technically creates an input space consisting of features. The output space consists of class labels that separate the various data points in a dataset based on the class boundaries. The essence of deep learning, as we will see, is to learn abstract representations in the feature space. Figure 1.2 illustrates how deep learning mediates between inputs and outputs: through abstract representations derived from the input data.
Figure 1.2 From input space to output space (labels). Deep learning constructs inter-mediate, abstract representations of input data, mapping an input space to a feature space. Through this mapping, it learns to relate input to output: to map the input space to an output space (encoding class labels or other interpretations of the input data).
Training a machine learning component involves learning boundaries between classes, which may depend on complex functions. The burden of learning class separability can be alleviated by smart feature preprocessing. Learning the class boundaries occurs by performing implicit or explicit transformations on linearly inseparable input spaces. Figure 1.3 shows a non-linear class boundary: a line separating objects in two classes that cannot be modeled by a linear function f(x) = ax + b. The function corresponding to this line is a non-linear classifier. A real-world example would be a bowl of multicolored marbles mixed in such a way that they cannot be separated from each other by means of a straight plate (like a flat scoop).
Figure 1.3 Non-linear classifier. The two classes (indicated with circles and triangles) cannot be separated with a linear line.
A linear function that separates classes with a straight line is a linear classifier and would produce a picture like figure 1.4.
Figure 1.4 Linear classifier. The two classes (indicated with circles and triangles) can be separated with a straight line.
We now briefly address three types of machine learning approaches that have had major uptake in NLP:
The single-layer perceptron and its generalization to the multilayer perceptron
Support vector machines
Memory-based learning
While there is a lot more to the story, these three types embody, respectively, the neural or cognitive, eager, and lazy types of machine learning. All of these approaches relate naturally to the deep learning approach to natural language analysis, which is the main topic of this book.
1.1.1 The perceptron
In 1957, the first implementation of a biologically inspired machine learning component was realized: Rosenblatt’s perceptron. This device, implemented on physical hardware, allowed the processing of visual stimuli represented by a square 400 (20 by 20) array of photosensitive cells. The weights of this network were set by electromotors driving potentiometers. The learning part of this perceptron was based on a simple one-layer neural network, which effectively became the archetype of neural networks (see figure 1.5).
Figure 1.5 Rosenblatt’s perceptron: the fruit fly of neural machine learning. It represents a single neuron receiving several inputs and generating (by applying a threshold) a single output value.
Suppose you have a vector of features that describe aspects of a certain object of interest, like the words in a document, and you want to create a function from these features to a binary label (for instance, you want to decide if the document conveys a positive or negative sentiment). The single-layer perceptron is capable of doing this. It produces a binary output y (0 or 1) from a weighted combination of input values x1...xn, based on a threshold θ and a bias b:
The weights w1, ...wn are learned from annotated training data consisting of input vectors labeled with output labels. The thresholded unit is called a neuron. It receives the summed and weighted input v. So, assume we have the set of weights and associated inputs shown in table 1.1.
Table 1.1 Weighted input
Then their summed and weighted output would be
This simplistic network is able to learn a specific set of functions that address the class of linearly separable problems: problems that are separable in input space with a linear function. Usually, these are the easier problems in classification. It is quite common for data to be heavily entangled. Consider undoing a knot in two separate ropes. Some knots are easy and can be undone in one step. Other knots need many more steps. This is the business of machine learning algorithms: undoing the intertwining of data objects living in different classes. For NLP, the single-layer perceptron nowadays plays a marginal role, but it underlies several derived algorithms that strive for simplicity, such as online learning (Bottou 1998).
A practical example of a perceptron classifier is the following. We set out to build a document classifier that categorizes raw texts as being broadly about either atheism or medical topics. The popular 20 newsgroups dataset (http://qwone.com/~jason/20Newsgroups), one of the most widely used datasets for building and evaluating document classifiers, consists of newsgroup (Usenet) texts distributed over 20 hand-assigned topics. Here is what we do:
Make a subselection for two newsgroups of interest: alt.atheism and sci.med.
Train a simple perceptron on a vector representation of the documents in these two classes. A vector is nothing more than a container (an ordered list of a finite dimension) for numerical values.
The vector representation is based on a statistical representation of words called TF.IDF, which we discuss in section 1.3.2. For now, just assume TF.IDF is a magic trick that turns documents into vectors that can be fed to a machine learning algorithm.
Don’t worry if you don’t completely understand the following listing right now. It’s here to give you an idea of what the code looks like for a basic perceptron.
Listing 1.1 A simple perceptron-based document classifier
from sklearn.linear_model import Perceptron ① from sklearn.datasets import fetch_20newsgroups ② categories = ['alt.atheism', 'sci.med'] ③ train = fetch_20newsgroups(å subset='train',categories=categories, shuffle=True) ④ perceptron = Perceptron(max_iter=100) ⑤ from sklearn.feature_extraction.text import CountVectorizer ⑥ cv = CountVectorizer() X_train_counts = cv.fit_transform(train.data) from sklearn.feature_extraction.text import TfidfTransformer ⑦ tfidf_tf = TfidfTransformer() X_train_tfidf = tfidf_tf.fit_transform(X_train_counts) perceptron.fit(X_train_tfidf,train.target) ⑧ test_docs = ['Religion is widespread, even in modern times', 'His kidneyå failed','The pope is a controversial leader', 'White blood cells fightå off infections','The reverend had a heart attack in church'] ⑨ X_test_counts = cv.transform(test_docs) ⑩ X_test_tfidf = tfidf_tf.transform(X_test_counts) pred = perceptron.predict(X_test_tfidf) ⑪ for doc, category in zip(test_docs, pred): ⑫ print('%r => %s' % (doc, train.target_names[category]))
① Import a basic perceptron classifier from sklearn.
② Import a routine for fetching the 20 newsgroups dataset from sklearn.
③ Limit the categories of the dataset.
④ Obtain documents for our category selection.
⑤ Our perceptron is defined. It will be trained for 100 iterations.
⑥ The familiar CountVectorizer is fit on our training data.
⑦ Load, fit, and deploy a TF.IDF transformer from sklearn. It computes TF.IDF representations of our count vectors.
⑧ The perceptron is trained on the TF.IDF vectors.
⑨ Our test