Nagesh Rao - Learning Python-CyberPlus Infotech (2021)
Nagesh Rao - Learning Python-CyberPlus Infotech (2021)
Nagesh Rao - Learning Python-CyberPlus Infotech (2021)
Python
B. Nagesh Rao
All right reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in critical
articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of
the information presented. However, the information contained in this book is sold
without warranty, either express or implied. Neither the author, nor publisher, and its
dealers and distributors will be held liable for any damages caused or alleged to be
caused directly or indirectly by this book.
Published by:
CyberPlus Infotech Pvt. Ltd.
32/5, 8th Main, 11th Cross,
Malleswaram, Bengaluru – 560 003.
INDIA.
www.cyberplusindia.com
For my parents, who always allowed me to follow my own path...
He says he admires the simplicity and omnipotence of C, the support for OOP in C++,
the ease of coding in and maintainability of Java, and the brevity of Perl! He adds that
what makes Python a very appropriate language for him is that Python can be
extended using C/C++, can run within the JVM and communicate with Java, and
provides the brevity of Perl without it’s obscurity! As a big fan of Object Oriented
Programming paradigm, he loves the fact that Python is completely object-oriented!
CyberPlus Infotech Pvt. Ltd. is an open-source company founded in the year 2000,
headquartered in Bengaluru, the silicon valley of India! The main activities of the
company are software development, corporate training, and professional mentoring.
The team in CyberPlus has developed close to 200 software applications, utilities and
libraries. They excel at developing object-oriented wrappers for various open-source
libraries, like Textuality (a wrapper on ncurses for Textual User Interface
development), GUICPP (a wrapper on GTK+ for GUI development) and GLOO (a
wrapper on OpenGL for 3D graphics programming and rendering).
The trainers in CyberPlus have conducted more than 200 batches of corporate
training for various clients in India like Nokia (formerly Nokia Siemens Networks, and
prior to that, Siemens Communications Systems), IBAB (Institute for Bioinformatics
and Applied Biotechnology), Delphi Technical Center, OpenStream Technologies,
Alcatel Lucent (now acquired by Nokia), Altiostar (formerly Radio Mobile Access),
Wipro Technologies, Comptel (now acquited by Nokia), Samsung India, Indecomm,
Unisys and Stryker India.
They have conducted training programs on C, C++, STL, Java, UNIX/Linux, Advanced
Linux, Advanced C, Advanced C++, Design Patterns, Perl, CGI, HTML, CSS,
Javascript, XML, Python, Data Structures, Analysis and Design of Algorithms, Finite
Automata and Formal Languages, Lex and Yacc, Microprocessors (8085 and 8086),
Advanced Microprocessors (80286 to Pentium), Computer Graphics, OpenGL,
Network Programming on Linux, SQL and MySQL, J2EE, etc.
They have mentored professionals and students to develop software at various client
locations including Wipro Technologies, ISRO, Nokia Siemens Networks and Siemens
Information Systems Limited.
Email: [email protected]
www.cyberplusindia.com
Learning Python i
even recollect all of them! This book’s existence is actually related to the fact that I
used to invent languages!
My fascination with gaming in particular led me to guide students to develop games in
Java using a game development library we had invented in CyberPlus. I found the
Java code too verbose and decided to invent a language specifically for writing and
expressing game logic. I developed a C++ game engine and a game interpreter using
C++ with Lex and Yacc, but what I realised later was that my game scripts looked
shockingly close to Python scripts!
Later when we were developing a game development framework in Java, I realised
that Python could be directly used for writing game scripts that could run within the
JVM!
Right from my teenage I had always wanted to share my knowledge. While I did a lot
of stuff – from teaching friends and classmates, writing articles, course materials and
lab manuals – I still had one dream to be realised: writing books! This is my first book
to be published, though I had made attempts in the past to write a book on C and data
structures!
Let me be honest here: I have spent far more time on other programming languages
like C, C++, Java and Perl than on Python. It is probably preordained that my first
book to be published will be on Python rather than on those other languages. But as I
pointed out earlier, programming is different from programming languages and I was
able to learn and understand the nuances of Python quickly because I knew and
understood programming in these other languages very well!
The proof of my deep understanding of programming concepts and languages was
the feedback the participants would give after my training programs – even those who
knew the language(s) beforehand felt they learnt many new things in my training!
Many novices also appreciated the way concepts were taught to them. They felt I
could explain even highly technical concepts in a simple way! It is that belief that I can
contribute that led me to write this book, though there are many books and materials
out there teaching Python in a dozen different ways!
I sincerely hope this book delivers what I expect it to: novices get to learn Python
well and experienced programmers get to learn Python fast!
While there are many people I am thankful to in my life, and many such contributions
from others have indirectly made this book possible, I will only name those who have
been directly responsible in making this book what it is!
I thank Sumesh Kumar for being the first reviewer. I appreciate the fact that he has
always been eager and more than willing to help, in any and every way he could.
I thank Santosh Manoharan for all his inputs and critical analysis. I admire his
attention to details in the review.
Learning Python iii
current form.
Many thanks to Prof. Nisha Choudhary, Asst. Prof., CSE, MS Engineering College,
Bangalore for framing the model VTU question papers on Python, added at the end of
this book. This will help students get an idea on what to expect in the examination and
help score better marks.
Some colleges have already made the first edition of this book the recommended
textbook for Python and we are thankful to those colleges! Not only have they made
the right decision by introducing Python in their respective syllabii, but also evaluated
our book and decided that this should be the book their students need to use!
Since we somehow had got associated with engineering colleges and their students
were becoming our customers, we decided to expand the book and add additional
chapters so that it completely covers the VTU syllabus. These additional chapters are
extensions that help students apply Python to solve common real world problems.
There are chapters added on Regular Expressions, Database Access and Parsing
HTML, XML and JSON. I hope that readers find the second edition even more
practical than the first and are able to accomplish more with Python!
As with the first edition of this book, my expectation remains the same: novices get
to learn Python well and experienced programmers get to learn Python fast!
Learning Python v
The book Learning Python by Nagesh Rao The book "Learning Python" by Mr. B.
is a well structured book for beginners. Nagesh Rao is a well written, well
The example programs at the end of the organised, cohesive book on the subject.
chapter exposes learners to practical It can be referred by the beginner as
programming. The chapter 10 practical well as an intermediate programmer in
python is highlight of the book, which Python. The flow of subject matter
shows various data structures through the book is logical and
implementation. Overall the book elaborate. From basics of python, to its
written in simple language and is great features, installation, data-types,
for Python learners. variables, input output, control
-Vindhya N. S.
structures,derived data types like lists,
Asst. Prof. (Computer Science & Engg.), tuples, dictionaries, sets, strings have all
Dayananda Sagar Academy of Technology & been explained in detail with a good
Management number of examples. Functions, data
structures like Stacks, Queues, Matrices
have also been discussed appropriately.
The Object Oriented Programming
principles have been covered well with
examples. The book provides nice
coverage to Exception Handling and File
Handling in Python along with an
"Learning Python" has allowed me to introductory chapter on Modules as well.
delve into the world of Python with such
ease that i just didn't realize the amount -Prof. Sunilkumar S. Manvi
Principal,
of knowledge that I have been able to
Reva Institute of Technology and Mgmt
garner in a span of a week. The book is & Director,
just a master-class where the author's School of Computing and Information
inquisitive mind has worked Technology, Reva University
phenomenally well in expressing
complex and the most intricate details
in the most lucid way. It has done
complete justice to all those individuals
who intend to make Python their Impeccable!! Difficult things made easy
primary skill. I am humbly and genuinely to learn and almost impossible to forget!
indebted to the author in releasing this Very well organised and structured. A
first edition and wish him all the very great way of learning Python and
best in his professional career. programming as such. Typical of B.
-Sumesh Kumar Nagesh Rao – needs no more to say about
Oracle it. He himself is the best adjective to
describe the book.
-Roopa Deepak
Freelance Trainer
Learning Python vii
1 INTRODUCTION
Compare text editors and IDEs and select the right development
environment for you
INTRODUCTION
1.1 About This Book
Whether you are a novice to programming or a seasoned professional who is new to
Python, this book has been designed for you!
Here are the top 10 salient features of this book that makes this book stand apart and
makes it special for you:
1. This book starts slowly! The fundamental concepts are taught gently to
ensure that you have a good foundation of basic concepts!
2. This book is well organised! The chapters, headings, subheadings and
content within the chapters have been planned very carefully so that you
conquer Python – chapter by chapter!
3. This book teaches interactively! The moment a new concept is taught,
there is code that immediately follows so that you understand how it looks to
the Python interpreter!
4. This book teaches even when you can’t practice! Not only do we show
you code immediately after teaching a concept, we also provide output from a
real Python session so that you can imagine how Python reacts when you
type in a piece of code!
5. This book teaches good programming practices! It is not only important to
learn Python, but to also code like a professional. While it will definitely take a
little bit of time to metamorphose from novice to professional, we show you
best practices and pitfalls that will accelerate your journey!
6. This book presents programs that solve real problems! When it is time to
apply Python, we show you constructive programs that demonstrate how to
apply Python concepts!
7. We analyse every bit of code! Everything there is to analyse is analysed.
Code snippets and programs are followed by output, which is then followed by
analysis!
8. We compare Python with other programming languages! For the benefit
of those readers who already know other programming languages like C/C++,
Java or Perl, we provide tips that help them migrate to Python faster. These
tips are in separate boxes to ensure that they don’t disturb those readers who
are not savvy with these languages!
9. We have filtered out content! While this might appear disadvantageous or
counter-intuitive, we believe in presenting the most important concepts in
detail and probably even skipping some concepts that you can live without!
Call it the 80/20 rule if you will – we have decided to present in great detail
those 20% of the features that you will use 80% of the time!
1. Introduction 3
10. Each heading and subheading is need based! Our style of explaining a
new concept is by first establishing a need. We believe this makes it easier for
learners to understand not only what they are learning but also why!
Happy learning!
C:\Python36\;C:\Python36\Scripts\
$ wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz
Resolving www.python.org... done.
Connecting to www.python.org[194.109.137.226]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8,436,880 [application/x-tar]
...
The next step is to configure the installation. Depending on your requirement, there
are a couple of options. If you want this version of Python to be installed for all users
of the system (and you have root privileges via su or sudo), run configure:
$ ./configure
The above command will configure Python to be installed in /usr/local. If you do not
have root privileges or wish to make this a local installation for the current user, run
configure in this manner:
$ ./configure –prefix=/some/other/directory
Typically in such cases, the specified directory would be a subdirectory within the
current user’s home directory. Here is an example of how the above command would
actually be used:
$ ./configure –prefix=$HOME/py-352
The above program will configure Python to be installed in the directory py-352
under the current user’s home directory.
After the configuration is performed, run the make command:
$ make
The last step is to copy the final files to the destination location. If you had opted for a
global installation, run make install as root:
$ make install
6 1. Introduction
The Python interpreter would now be available inside the bin subdirectory within the
installation directory. Considering our example of local directory “$HOME/py-352”, the
interpreter will now be available at '$HOME/py-352/bin/python3'. You could add
the directory to the PATH variable so that you could just type in python3 henceforth.
To do so, add this line to your ~/.bashrc file:
PATH="$PATH:~/py-352/bin"
python3
The “>>>” is the Python prompt (technically the Python primary prompt) – an
indication that the Python interpreter is ready to accept commands from you. You can
type in any valid Python statement or expression here and the result is immediately
displayed. We will use this feature to quickly test stuff when introducing a new
concept. Of course, for bigger pieces of code, we will prefer a full-fledged Python
script.
Let us first try out giving expressions at the Python prompt:
>>> 2+3
5
>>>
1. Introduction 7
Observation:
1. 2+3 is a valid expression in Python. If it was an invalid expression, we would
have got an error message.
2. The Python interpreter responds by printing the value of the expression, 5 in
this case.
3. After finishing with the expression, the Python interpreter shows the Python
prompt again, waiting for the next command from us.
Let us also type in a valid Python statement and see what happens:
Observation:
1. The statement “import math” is a valid statement in Python and means that
we wish to import the math module. Had the statement been invalid, we
would have received an error message.
2. Valid Python statements are executed by the Python interpreter immediately,
though it’s effect might not always be visible. The Python interpreter prints the
result of the statement only if either the statement was actually an expression
(in which case, the value of the expression is printed), or we had explicitly
issued a Python statement to print something (using the print() function,
for instance, which we will see later on).
Now that we know how to work on the Python interpreter, let us also learn how to quit
the interpreter when we are done. Typing an end-of-file character (Control-D on
Unix/Linux, Control-Z on Windows) at the Python primary prompt causes the
interpreter to exit. You can also exit the interpreter by typing the quit() command:
>>> quit()
The interpreter’s line-editing features include the following (provided the system
supports them):
1. Interactive editing – the ability to edit the current line contents before
submission by using cursor keys, backspace and delete keys.
2. History substitution – the ability to reproduce previously entered lines by
using the up and down cursor keys.
3. Code completion – the ability to complete the current word when the user
8 1. Introduction
presses tab (or produces a list of options when there are multiple
possibilities).
The interpreter operates somewhat like the Unix shell: when called with standard input
connected to a tty device, it reads and executes commands interactively; when called
with a file name argument or with a file as standard input, it reads and executes a
script from that file.
A second way of starting the interpreter is:
This launches the Python interpreter which executes the statement(s) in command,
analogous to the Unix shell’s -c option. Since Python statements often contain
spaces or other characters that are special to the shell, it is usually advised to quote
command with quotes (preferably single quotes).
Here’s an example:
Observation:
1. We have given 2 statements: a=2+3 and print(a). Python understands
they are separate statements because of the semi-colon (;) between them.
The space given between them is only for readability.
2. Since the shell might have a problem with some special characters within the
Python statements (like semi-colon and space), we prefer to enclose the set
of statements within single quotes, the contents of which the shell will not
process.
3. After executing the set of statements, the Python interpreter automatically
terminates.
1.5.1 Editors
There are many editors available for users to code in Python. The most popular ones
which have been designed especially to suit the needs of Python programmers are:
1. Introduction 9
1.5.2 IDEs
The IDEs most suitable for Python are:
1. PyCharm – Supported on Windows, Linux, MAC OS X (Website:
https://www.jetbrains.com/pycharm/download/)
2. Wing IDE - Supported on Windows, Linux, MAC OS X (Website:
http://www.wingware.com/downloads)
3. Pyzo – Supported on Windows, Linux, MAC OS X (Website:
http://test.pyzo.org/downloads.html)
4. PyScripter – Supported only on windows (Website:
https://sourceforge.net/projects/pyscripter/)
Eclipse users who do not wish to switch to other IDEs, can make of PyDEV plugin for
Python support.
script should start with a “#” followed immediately by the pathname of the
Python interpreter. For example, the first line of the script could be
“#/usr/bin/python”.
NOTE:
Even if we use the first method of execution, there is no harm in retaining the first
line shown in method 2 as it would be considered to be merely a comment!
1.7 Questions
1. List the salient features of Python.
2. How did Python evolve? Give a brief description of its development from its
early years to what it is now.
3. How would you install Python on the following platforms?
- Windows
- Linux
4. What alternative do you have to install Python locally assuming that you do
not have super-user permission and do not intend to install it globally?
5. Assuming that you have a default Python interpreter installed (e.g.
/usr/bin/python) and you wish to upgrade Python to a newer version,
how would you go about setting it up? Is it advisable to overwrite the existing
Python or have it as a separate entity?
6. How would you determine the version of Python that you are currently using?
7. How do you quit from a Python terminal in Windows & Linux? Is there a
common-way of quitting on both platforms? If yes, what would that be?
8. What options do you find most useful when you look at the output of 'python
--help'?
9. At what stage would you prefer switching from a Python interpreter to writing
a full-fledged progam?
10. When would you want to use the '-c' option with a Python interpreter?
1. Introduction 11
SUMMARY
➢ Python is portable and Python scripts will work equally well both
in Windows as well as Linux.
2 PYTHON BASICS
Work with the basic data types of Python and operate upon them.
PYTHON BASICS
2.1 Our First Python Script
The first program in any language is typically a “Hello World” program, and let's be no
different! Here is our first program that prints Hello World when executed:
HelloWorld.py
1. print("Hello World")
Output:
Hello World
Type the above program in your favourite text editor or IDE (See section 1.5 for help
in selecting) and execute it.
NOTE:
Of course, don't type in the line number (“1.”) in your programs! They have been
given only for your reference, and will help when we discuss larger programs.
Python relies on indentation rules to decide which statements belong to which block
(will be covered in section 3.2.1) and uses the newline character as the statement
terminator. In other words, a statement in Python is understood to be terminated in the
same line.
Let us revisit Helloworld.py and write multiple statements – of course, one per
line.
HelloWorld2.py
1. print("Hello World")
2. print("Hi from India!")
Output:
Hello World
Hi from India!
HelloWorld3.py
Output:
Hello World
Hi from India!
NOTE:
There is no harm in ending a statement with a semi-colon, though it is neither
required nor preferred.
1. print(\
2. "Hello\
3. World")
4. print("Hi fr\
5. om India!")
Output:
Hello World
Hi from India!
16 2. Python Basics
1. print('Hello World')
2. print('"Hi" from \'India\'!')
Output:
Hello World
"Hi" from 'India'!
1. print("Hello World")
2. print("'Hi' from \"India\"!")
Output:
Hello World
'Hi' from "India"!
Furthermore, single quotes and double quotes can be freely used inside triple quotes
– of course you can't directly use the 3 quotes used to start the string in succession or
else it would be considered to be triple quotes and will terminate the string. Escaping
of single quotes and double quotes is possible, if at all needed.
Here is the improved program:
HelloWorld7.py
1. print("""Hello World
2. 'Hi' from "India"!""")
Output:
Hello World
'Hi' from "India"!
Triple quotes differs from single quotes and double quotes in the following ways:
1. They allow strings to span multiple lines
2. They do not require a terminating backslash at the end of every line to
indicate continuation
3. They allow the embedding of both single quotes and double quotes without
the need for escaping (except when there are 3 quotes in succession in which
case an escaping might be required)
4. They preserve the newline character (that exists at the end of every line)
within the string
5. They can have a special purpose – that of providing documentation (covered
in section 10.15)
2.2 Comments
Comments are pieces of text added within programs to improve the readability.
Comments are non-executable and the interpreter will ignore it's contents.
A comment in Python is indicated by the hash (#) symbol – everything from the hash
till the end of that line is considered to be a comment.
Here is another version of Helloworld.py using comments:
18 2. Python Basics
HelloWorld8.py
Output:
Hello World
Hi from India!
A good programmer always uses comments to document why the code is doing what
it is doing, or document facts that are either difficult to make out from the code or is
not present in the code. Of course, the code should always be written in a manner that
it clearly specifies what it is doing. Sometimes comments are also added to
summarise what a block is doing – with the definition of a block also extending to
functions, classes and modules!
Sometimes, the comments require multiple lines, and we end up using many hashes.
There is a simpler way to implement such a comment using the concept of triple
quotes. Triple quotes are not comments as such; they are technically strings – but
strings do not do anything on their own, and can end up behaving like comments!
Section 10.15 will introduce an interesting application of comments used for
documenting program elements, implemented using triple-quoted strings!
Here is a version that demonstrates both approaches:
HelloWorld9.py
1. # Hi!
2. # This program prints Hello World
3. # I hope you like it!
4. # ===============================
5. #
6. print("Hello World")
7. """
8. ====================================
9. Hi!
10. This program printed Hello World
11. Did you run it?
12. """
2. Python Basics 19
Output:
Hello World
• Any literal constant made up of only digits and alphabets a-f in upper-case or
lower-case and with a leading 0x or 0X (with an optional leading sign before
that) is considered to be a hexadecimal constant of type int.
Example:
>>> 15
15
>>> -25 #Negative integer
-25
>>> 0o15 #Octal integer
13
>>> -0o15 #Negative octal integer
-13
>>> 0o08 #Illegal octal integer
File "<stdin>", line 1
0o08
^
SyntaxError: invalid syntax
>>> 0x12abc4 #Hexadecimal integer
1223620
>>> -0X12aBc4 #Negative hexadecimal integer with mixed case
-1223620
>>>
NOTE:
Any output that contains the Python prompt >>> (as in the case above) implies that
this is a snapshot of an interactive python session and not the execution of a
Python script. An interactive Python session helps us get output instantaneously as
each statement is immediately processed and it's result is displayed. This can help
as a learning tool. In such examples, whatever we need to type is shown in bold
and rest is printed by the Python interpreter. When we want to combine multiple
statements together for a more complex job, we will shift to writing Python scripts
instead. See section 1.4 for information on how to launch the Python interpreter.
Objects of type int can also be created using the explicit constructor of class int.
This of course also resembles a function call for type conversion. Thus, the following
two statements are identical in function:
Example:
x = 2
x = int(2)
2. Python Basics 21
This built-in function is also helpful for converting other types like float and string
to the type int:
>>> int(23.99)
23
>>> int("25")
25
The default constructor of int class will create an integer with value 0, and a 2-
argument constructor has been provided to convert a string representation of an
integer in a particular base into a normal integer. The constructors have the following
declaration:
Syntax:
int(x=0)
int(str_x, base=10)
Some examples follow:
>>> int()
0
>>> int(25)
25
>>> int("25")
25
>>> int("25",8)
21
2 * 3 6 Multiplication
10 / 3 3.3333333333333335 Division
10 // 3 3 Integer division
10 % 3 1 Modulus/Remainder
2 + 3 5 Addition
2 – 3 -1 Subtraction
NOTE:
If the example “10/3” above yields “3” as the answer, you are using an old version
of Python (less than 3.x). In such a case, while you can continue and learn Python
through this book, it is recommended that you upgrade Python to the latest version
to derive maximum benefits.
NOTE:
When you encounter functions like “math.factorial()”, it indicates that we are
referring to the factorial() function of math module. For this to work, it is
necessary to execute the statement import math once before invoking any
function of that module in the session/script.
Objects of type int can be converted to other bases using these built-in functions:
The following bitwise operators are available on type int (have a look at section 20.2
if you want to learn bit manipulation):
2. Python Basics 23
2 | 3 3 Bitwise OR
2 ^ 3 1 Bitwise XOR
~2 -3 1's complement
2 == 3 False 2 is equal to 3?
NOTE:
These comparisons are available for all numeric types. It is possible to compare
two objects of different types using these operators. If the comparison does not
makes sense and is not implemented, a TypeError exception is raised instead.
Since these are available for all numeric types, it will not be repeated in the
following sections.
NOTE:
The type int also supports Boolean operations, covered in section 2.3.5.2.
24 2. Python Basics
>>> 123.5
123.5
>>> 123.0
123.0
>>> 12e2
1200.0
>>> 123.5e2
12350.0
>>> 123.45678e2
12345.678
Objects of type float can also be created using the explicit constructor of class float.
This of course also resembles a function call for type conversion. Thus, the following
two statements are identical in function:
Example:
x = 2.5
x = float(2.5)
This built-in function is also helpful for converting other types like int and string to
the type float. Also, the default constructor of float class will create a float with
value 0.0.
>>> float(23)
23.0
>>> float("23")
23.0
>>> float()
0.0
2. Python Basics 25
You might have observed that some results are not mathematically correct – there is a
very small deviation, but obvious in these examples. Python provides the Decimal
class (not covered in this book) that works better as we'd expect it to, but it comes
with it's own cost in terms of efficiency.
The following mathematical functions are available on type float:
NOTE:
The functions that start with math. are functions of the math module, which needs
to be imported prior to function call using the statement: import math.
2. Python Basics 27
NOTE:
The type float also supports comparison, like the ones covered in section 2.3.1.2
and Boolean operations, like the ones covered in section 2.3.5.2.
>>> (2+3j)
(2+3j)
>>> 2+3J
(2+3j)
>>> (2+3j).real
2.0
>>> (2+3j).imag
3.0
Objects of type complex can also be created using the explicit constructor of class
complex. This of course also resembles a function call for type conversion. Thus, the
following two statements are identical in function:
Example:
x = 2+3j
x = complex(2,3)
This built-in function is also helpful for converting other types like int and string to
the type complex. Also, the default constructor of complex class will create a
complex with value 0j, a 1-argument constructor that receives an int x will create a
complex with value x+0j and a 1-argument constructor that receives a string in the
format x or x+yj or x+yJ will create a complex object suitably.
28 2. Python Basics
Examples:
>>> complex()
0j
>>> complex(2)
(2+0j)
>>> complex("2+3j")
(2+3j)
>>> complex("2+3J")
(2+3j)
>>> complex("2")
(2+0j)
Just as how the math module provides mathematical functions for dealing with real
numbers, the cmath module provides mathematical functions for dealing with
complex numbers. The module needs to be imported prior to use using the statement:
import cmath. We can convert any complex object into polar coordinates using the
polar() function, and convert polar coordinates into a complex object using the
rect() function as shown in the examples in the table below. We can also extract the
magnitude using abs() and argument using phase() as shown in the examples in
the table below:
2. Python Basics 29
Examples:
NOTE:
The type complex also supports == and != comparisons, covered in section
2.3.1.2 and Boolean operations, covered in section 2.3.5.2.
TIP
If a string has many single-quotes, enclose it in double-quotes so that the quotes
need not be escaped. Similarly, if a string has many double-quotes, enclose it in
single-quotes. If a string has both quotes, enclose it in triple-quotes!
Objects of type str can also be created using the explicit constructor of class str.
This of course also resembles a function call for type conversion. Thus, the following
two statements are identical in function:
Example:
x = "abc"
x = str("abc")
This built-in function is also helpful for converting other types like int and float to
the type str. Also, the default constructor of str class will create a null string ('').
>>> str("10")
'10'
>>> str()
''
NOTE:
Python strings support Unicode. If strings made up of only ASCII characters are
desired, Python provides a separate bytes class (section 20.3) for that, though
Python strings can also store ASCII characters.
The following table shows the escape characters that can be used within strings:
2. Python Basics 31
Escape Meaning
Sequence
\\ Backslash (\)
\a Alert (beep)
\b Backspace
\f Formfeed
\n Newline
\r Carriage return
\t Tab
\v Vertical tab
If a string contains backslashes, but the intention is not to denote escape sequences,
the backslashes need to be escaped as shown in the table above. If however, the
string contains many such backslashes, the entire string can be passed to Python as
a raw string using the prefix r or R. In such cases, all characters are retained within
the string without any special meaning and therefore no escape sequences are
processed. Examples of escape sequences and raw strings follow.
32 2. Python Basics
Examples:
>>> "\tHello\nWorld\b\n\101\x41"
'\tHello\nWorld\x08\nAA'
>>> r"\tHello\nWorld\b\n\101\x41"
'\\tHello\\nWorld\\b\\n\\101\\x41'
>>> R"\tHello\nWorld\b\n\101\x41"
'\\tHello\\nWorld\\b\\n\\101\\x41'
Observation:
1. The character A has an ASCII code of 65, which is 101 in octal and 41 in
hexadecimal. These codes have been used in the examples above.
2. Python processes certain escape sequences and represents them in a
different form (\b became \x08 and \x41 became A).
3. The r or R prefix will escape any backslashes so that they represent the literal
backslash character rather than indicating an escape sequence.
NOTE:
The type string also supports comparisons, covered in section 2.3.1.2 and
Boolean operations, covered in section 2.3.5.2.
34 2. Python Basics
String concatenation can be achieved by using the “+” operator. Furthermore, string
literals separated by only whitespaces are implicitly concatenated.
Examples:
NOTE:
Do not conclude from the above examples that the “+” operator is not required in
Python – remember that “+” can concatenate the contents of two strings referred to
by variables at runtime! Check out the code snippet below:
>>> x="abc"
>>> y="xyz"
>>> x+y
'abcxyz'
>>> x y
File "<stdin>", line 1
x y
^
SyntaxError: invalid syntax
x = True
x = bool(True)
2. Python Basics 35
This built-in function is also helpful for converting other types like int and string to
the type bool. Also, the default constructor of bool class will create a bool with
value False.
>>> bool(0)
False
>>> bool(10)
True
>>> bool(0.0)
False
>>> bool(-12.5)
True
>>> bool("hello")
True
>>> bool("")
False
>>> bool()
False
NOTE:
Section 20.1.1.4 discusses conversion to bool in detail and can throw more light
on why we are getting the above output.
The and operator can work on other types too! To generalise it's working, consider the
example x and y.
• If x evaluates to False, the result is x
• If x evaluates to True, the result is y
>>> 2 and 5
5
>>> 5 and 2
2
>>> 0 and 5
0
The or operator can work on other types too! To generalise it's working, consider the
example x or y.
• If x evaluates to False, the result is y
• If x evaluates to True, the result is x
>>> 2 or 5
2
>>> 5 or 2
5
>>> 0 or 5
5
Examples:
The not operator can work on other types too! To generalise it's working, consider the
example not x.
• If x evaluates to False, the result is True
• If x evaluates to True, the result is False
>>> not 2
False
>>> not 5
False
>>> not 0
True
NOTE:
When other data types are used here instead of bool, a Boolean conversion takes
place as documented in section 20.1.1.4.
The type bool is considered to be a subtype of Integral (just like the type int),
and hence all operators that work on int also work on bool. In such cases, it will
help to remember to substitute True=1 and False=0.
2.4 Identifiers
An identifier is a name given to a program element. Variable names, class names and
function names are some examples of identifiers.
The rules for naming identifiers in Python are similar to those of C but also permit
Unicode characters:
1. The permissible characters are alphabets, digits and underscores
2. The first character cannot be a digit. Underscore as the first character can
have special meanings that will be discussed in later sections
3. The length is not limited (but is subject to practical limits however!)
4. An identifier cannot have the same name as a keyword (discussed in section
2.5)
38 2. Python Basics
Good programming practices dictate that identifier names be relevant and readable so
that a reader can figure out what the identifier represents. The coding convention used
by Python professionals are:
1. Variable names and function names should be in lowercase with underscores
separating words. E.g.: variable_name, function_name()
2. Class names (and exception names as they are classes) should be in
lowercase with the first letter of every word capitalized. E.g.: ClassName,
ExceptionName
3. Constant names should be in uppercase with underscores separating words.
E.g.: CONSTANT_NAME
4. All other identifiers (package name, module name, method name, instance
variable name, local variable name, function parameter name) should follow
the convention of variable names
2.5 Keywords
Keywords are special words that are understood by Python. These cannot be used as
identifiers. We will learn keywords on a need basis as and when it is time to learn
them. We have already seen some keywords like True, False, and, or and not.
Below is the list of all keywords in Python in alphabetical order:
and else in return
as except is True
def global or
del if pass
2.6 Variables
Let us now discuss a few points about variables, their data types, their values and
operations on them before proceeding to input and output. We have already seen the
data types available in Python. These data types come into use when we have certain
values of that type and the entity that holds this value is a variable. Here are a few
important points about variables in Python:
#1 Variables can be created when required and where required
Unlike some of the more restrictive languages, we can start using a variable whenever
required without having to declare it at the beginning of a block. This gives us flexibility
and allows us to maintain focus on solving the problem at hand. A variable springs into
existence the moment a value to assigned to it. There is no concept of specifying it's
data type.
x = 10 #x is created here
x = 25 * x
y = x #y is created here
x = 10 #x is created here
x = 25 * x #The value of x changes here
MAX = 100 #MAX is created here
MAX = 200 #The value of MAX changes here
>>> x=2
>>> type(x)
<class 'int'>
>>> x=2.5
>>> type(x)
<class 'float'>
>>> type(int)
<class 'type'>
>>> type(type)
<class 'type'>
#4 We can programmatically find out the data type of the value associated with
a variable, and can also determine whether the type is “compatible” with
another
While the type() function can give us the data type, what is more practical most of
the times is simply verifying whether the data type of a particular value is a particular
one or not (or whether it is a subtype of a particular one or not). For the built-in
numeric types, the hierarchy is covered in detail in section 20.1 and can be verified
from the examples below:
Observation:
1. The value 2.5 is of type float, which is a subtype of numbers.Real, which
is a subtype of numbers.Complex, which is a subtype of numbers.Number,
which is a subtype of object. The entire type hierarchy can be verified from
section 20.1.
2. At the end of the day, all values are stored within objects, making object the
root of the entire hierarchy.
#5 The operations permissible on a variable depend on the data type of the
value the variable is currently referring to and thus can change during the
execution of the script
Since we have established that the data type of a variable can change depending on
the data type of the value it is referring to, and the operations that can be performed
depends on the data type, what can and cannot be done with a variable depends on
the data type of the value the variable currently refers to. As the data type of the value
changes, so do the permissible operations! This is illustrated in the examples below:
>>> x=25
>>> x%7
4
>>> len(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'int' has no len()
>>> x='Hello'
>>> x%7
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting
>>> len(x)
5
#6 Values are objects and variables are merely references to these objects
Any data item (value) is an object, and when they are assigned to a variable, what the
variable is said to contain is a reference to that object. Thus, values and variables are
very different! A single value (object) can have multiple references to it. Put another
way, multiple variables can claim to contain the same identical value. Destroying a
variable decrements the number of references that particular value has. When this
reference count reaches 0 (an indication that the value does not have any variable
referring to it) it is considered to be garbage and is a candidate for garbage collection.
Destroying a variable is immediate and results in the variable disappearing from the
program (till it is recreated if required). This however need not imply a destruction of
the value.
42 2. Python Basics
A variable that is no longer needed can be destroyed using the del built-in. Keep in
mind the following:
1. There is no compulsion for using del to delete variables that are no longer
needed
2. These destroy only the references; there is no guarantee that any object will
be freed from memory
3. Even if an object is to be freed, there is no saying whether and when it will be
freed
>>> x=5
>>> x
5
>>> del x
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> print('hello')
hello
>>> print(12)
12
>>> print(2.5)
2.5
>>> print(True)
True
>>> print("2.5")
2.5
>>> print(1.5 * 3)
4.5
2. Python Basics 43
>>> print("hello","world")
hello world
>>> print("2+3=",2+3)
2+3= 5
>>> print("hello","world",sep=',')
hello,world
>>> print("hello","world",sep=" <-> ")
hello <-> world
>>> print("Welcome","to","Python",sep=" <-> ")
Welcome <-> to <-> Python
>>> print("2+3=",2+3,sep="")
2+3=5
>>> print("2+3=",2+3,end="")
2+3= 5>>>
44 2. Python Basics
>>> print("Welcome","to","Python",sep="...",end="\n---===---\n")
Welcome...to...Python
---===---
>>> print("Welcome","to","Python",end="\n---===---\n",sep="...")
Welcome...to...Python
---===---
Observation:
1. The order in which the keyword arguments sep and end are given do not
matter
2. Keyword arguments must be at the end of the argument list. They cannot
precede any non-keyword argument.
>>> print()
>>>
NOTE:
The format() function of string class does not print! It returns the formatted string,
which can then be printed using print() if required.
In the above example, argument #0 is Hello and is substituted within the string
wherever {0} is present, and similarly argument #1 (world) is substituted wherever
{1} is present in the string.
Considering that we now know the various ways of specifying the argument and also
some simple formatting, let us see various combinations that we can use:
Combination Meaning
{0} Display argument #0
>>> name=input()
Ram
>>> language=input()
Python
>>> print('{} is learning {}'.format(name,language))
Ram is learning Python
2.9 Questions
1. Define a ‘statement’ in Python.
2. Which ASCII character signifies the end of a statement in Python?
3. Which ASCII character is used to separate statements in a single line in
Python?
4. How does triple quotes(""") help to form a comment in Python?
5. List the basic data types of Python.
6. How do we know the data type of a data item referred to by a variable in
Python?
7. Which operator is used for integer division in Python?
8. Which expression in Python is equivalent to a mathematical function pow()?
9. When does Python consider a given literal as a ‘float’ type?
10. How can a null string be represented in Python?
11. List any 3 operations that can be performed on the type ‘str’ in Python.
12. List any 2 operations that can be performed on the type ‘bool’ in Python.
13. List the rules for naming identifiers in Python.
14. Write a short note on variables and references in Python.
15. Write a short note on the complex type of Python.
16. How is the help() function of the Python interpreter different from the –-
help option?
17. Write a short note on formatting strings in Python.
2.10 Exercises
1. Write a program to find if a given number is a perfect square or not.
2. Write Python statements to calculate and print the simple interest using
variables p, t and r for principal, rate of interest and time duration
respectively.
3. Write Python statements to create 2 complex objects and print their sum,
50 2. Python Basics
SUMMARY
➢ The basic data types of Python are int, float, complex, str and
bool.
1. #!/usr/bin/python3
2.
3. # Program that asks for the user's name and age
4. # and prints it back as a greeting!
5.
6. name = input("Enter your name: ")
7. age = int(input("Enter your age: "))
8.
9. print("Hi {}, you are {:d} years old!".format(name,age))
Output:
Observation:
1. The first line of the script is “#!/usr/bin/python”. Recall from section 1.6
that this helps in executing the script implicitly. Regardless of whether we
intend to execute the script implicitly or pass it to the interpreter explicitly, we
will follow this style from now on.
2. In line 9, the “:d” format is being applied on age to print it in decimal (though
that is the default for an int anyway and would have printed the same as a
string too). This format requires age to be an int rather than a string that
input() returns. Therefore, this conversion is performed in line 7.
3. The input() function always returns a string. If we were expecting a
different type (like an int), it is a good idea to perform a conversion
immediately after obtaining the result from input() (as shown in line 7) .
Let us now write a program that performs some basic operations on integers like:
• Finding it's previous and next integer
• Finding it's square and cube
• Finding it's square root and factorial
int_demo.py
1. #!/usr/bin/python3
2.
3. # Program that performs basic operations on int
4.
5. import math
6.
7. x = int(input("Enter an integer: "))
8.
9. print("x={}".format(x));
10. print("x lies between {} and {}".format(x-1,x+1))
11. print("square(x)={} cube(x)={}".format(x**2,pow(x,3)))
12. print("sqrt(x)={}".format(math.sqrt(x)))
13. print("factorial(x)={}".format(math.factorial(x)))
Output:
Enter an integer: 5
x=5
x lies between 4 and 6
square(x)=25 cube(x)=125
sqrt(x)=2.23606797749979
factorial(x)=120
3. Python Control Structures 55
Observation:
1. Line 5 imports the math module that is required for functions like sqrt() and
factorial()
2. The square of an integer can be found out using x**2 or pow(x,2)
1. #!/usr/bin/python
2.
3. # Logarithms and anti-logarithms
4.
5. import math
6.
7. x = float(input("Enter a real number: "))
8.
9. print("Natural logarithm and anti-logarithm")
10. print("ln(x)={} exp(ln(x))={}".format(math.log(x),
math.exp(math.log(x))))
11.
12. print("\nCommon logarithm and anti-logarithm")
13. print("log(x)={} antilog(log(x))={}".format(math.log10(x),
pow(10,math.log10(x))))
14.
15. print("\nBase 2 logarithm and anti-logarithm")
16. print("log2(x)={}
antilog2(log2(x))={}".format(math.log(x,2),
pow(2,math.log(x,2))))
Output:
Observation:
1. Line 5 includes the math module required for the functions used here
2. The log() function gives the natural logarithm by default, and exp() is the
natural anti-logarithm
3. The log10() function gives the common logarithm (base 10) and common
anti-logarithm of x is 10 to the power x
4. For logarithm of any other base, the log() function can be used with the
desired base as the second argument as shown in line 16. The anti-logarithm
of x to base b is b to the power x
The next program deals with trigonometry:
trigonometry.py
1. #!/usr/bin/python
2.
3. # Trigonometry demo
4.
5. import math
6.
7. angle = float(input("Enter an angle in degrees: "))
8.
9. angle = math.radians(angle) # Conversion from degrees to
radians
10.
11. print("sin(x) =",math.sin(angle))
12. print("cos(x) =",math.cos(angle))
13. print("tan(x) =",math.tan(angle))
14. print("cosec(x) =",1/math.sin(angle))
15. print("sec(x) =",1/math.cos(angle))
16. print("cot(x) =",1/math.tan(angle))
Output:
Observation:
1. The trigonometric functions work with radians and hence we first convert the
given angle in degrees to radians in line 9
2. There are no separate functions for sec(), cosec() and cot(), which are
the reciprocals of cos(), sin() and tan() respectively, as implemented in
lines 14-16
3. The value of cos(90°) is mathematically 0, but here we get
6.123233995736766e-17 which is
0.00000000000000006123233995736766 (nearly 0). This is because of the
precision limits as well as approximation of π used while converting to
radians.
4. Similarly, the value of tan(90°) is infinity, which Python can represent as
float("inf"). However, again due to precision issues and approximation
of π, the value obtained is 1.633123935319537e+16 , which is
16331239353195370, a value supposedly “close” to infinity. However, it is
also true that math.degrees(math.atan(1.633123935319537e+16 ))
will yield 90.0!
1. #!/usr/bin/python
2.
3. # Demonstration of operations on a complex number
4. import math
5. import cmath
6.
7. r = float(input("Enter the real part of a complex number:
"))
8. i = float(input("Enter the imaginary part of a complex
number: "))
9.
10. c = complex(r,i) # Construct the complex number
11.
12. print("The complex number is",c)
13. print("\tIt's real part is",c.real)
14. print("\tIt's imaginary part is",c.imag)
15. print("It's amplitude is",abs(c))
16. print("It's angle is",math.degrees(cmath.phase(c)))
58 3. Python Control Structures
Output:
Observation:
1. We have imported the math module in line 4 because of the call to
degrees() function in line 16.
2. We have imported the cmath module (complex mathematics) in line 5
because of the call to phase() function in line 17.
3. Refer to section 2.3.3.2 for a help on the mathematical functions used here.
1. #!/usr/bin/python
2.
3. # Program to find the simple interest and amount
4.
5. p = float(input("Enter the principal: "))
6. r = float(input("Enter the rate of interest (%pa): "))
7. t = float(input("Enter the duration (years): "))
8.
9. si = (p*t*r)/100
10. amount = p + si
11.
12. print("Simple interest={}",si)
13. print("Amount={}",amount)
3. Python Control Structures 59
Output:
3.2 Decisions
One of the fundamental control flow statements any programming language provides
is the support for decisions – to conditionally execute a piece of code. Python
provides the if Statement to implement decisions. Like most other programming
languages, this has many forms – each of which will be discussed in the following
sections.
if condition: statement
if condition:
statements
...
In the first form shown above, if the condition evaluates to True, the following
statement is executed. If the condition evaluates to False, that statement is skipped
and control moves on to the next line.
In the second form shown above, if the condition evaluates to True, all the
statements within the block are executed in sequence. If the condition evaluates to
False, all the statements within the block are skipped and control moves to the first
statement outside the block.
The statements inside the if block are identified by their indentation! To keep things
simple, for now we'll use a single tab character per level of indentation.
Let us write a program that illustrates this. Here is a program that accepts the user's
name and age and prints a message saying whether the user is eligible to vote or not,
using the premise that a person needs to be at least 18 years of age in order to be
able to vote.
60 3. Python Control Structures
vote1.py
1. #!/usr/bin/python3
2.
3. # Program that asks for the user's name and age
4. # and prints whether the user is eligible to vote or not!
5.
6. name = input("Enter your name: ")
7. age = int(input("Enter your age: "))
8.
9. if age >= 18: print("Hi {}! You can vote!".format(name))
10. if age < 18: print("Sorry {}! You can't
vote!".format(name))
Output:
Observation:
1. The condition for a person to be able to vote is age >= 18. Similarly, the
condition for a person to not be able to vote is age < 18.
2. Only one of the two conditions above can be true for any given value of age
We have used the first syntax of the if statement for implementing this since we have
only a single statement to be executed if the condition is true. However, nothing
prevents us from using the second syntax and have just a single statement within the
statement block. This is shown below:
3. Python Control Structures 61
vote2.py
1. #!/usr/bin/python3
2.
3. # Program that asks for the user's name and age
4. # and prints whether the user is eligible to vote or not!
5.
6. name = input("Enter your name: ")
7. age = int(input("Enter your age: "))
8.
9. if age >= 18:
10. print("Hi {}! You can vote!".format(name))
11.
12. if age < 18:
13. print("Sorry {}! You can't vote!".format(name))
Output:
if condition: statement1
else: statement2
if condition:
statement_block1
...
else: statement2
if condition: statement1
else:
statement_block2
...
if condition:
statement_block1
...
else:
statement_block2
...
1. #!/usr/bin/python3
2.
3. # Program that asks for the user's name and age
4. # and prints whether the user is eligible to vote or not!
5.
6. name = input("Enter your name: ")
7. age = int(input("Enter your age: "))
8.
9. if age >= 18:
10. print("Hi {}! You can vote!".format(name))
11. else:
12. print("Sorry {}! You can't vote!".format(name))
3. Python Control Structures 63
The output is the same as the previous program and hence not duplicated here. Isn't
this program more readable and more simpler than the previous one? It also turns out
to be more efficient as well!
if condition1:
statement_block1
...
elif condition2:
statement_block2
...
elif condition3:
statement_block3
...
...
else:
statement_blockN
Observation:
1. We have used the statement block syntax above, but each condition (and the
else clause as well) could well be followed by single statements instead.
2. The complete syntax of the if statement is therefore an if block followed by
0 or more elif blocks followed by an optional else block.
3. The conditions are tested sequentially and the moment one of them evaluates
to True, the corresponding block is executed sequentially and all the other
blocks are skipped with control resuming after the if-else block.
4. If none of the conditions evaluate to True, the else block is executed if the
else clause is present. Otherwise, the entire if-else block is skipped.
Let's write a program to use this. Given an integer, let us classify it as being a positive
integer, negative integer or zero.
64 3. Python Control Structures
int_sign.py
1. #!/usr/bin/python3
2.
3. # Program that classifies an integer as being
4. # 1. A positive integer
5. # 2. A negative integer
6. # 3. Zero
7.
8. x = int(input("Enter an integer: "))
9.
10. if x > 0:
11. print("{} is positive".format(x))
12. elif x < 0:
13. print("{} is negative".format(x))
14. else:
15. print("{} is zero".format(x))
There is probably nothing more to explain about the program – it should be pretty self-
explanatory! Do note however, that due to mutual exclusion of the conditions, we don't
have to explicitly check if the number is zero when it is neither positive nor negative.
In this program, the order in which we give the conditions does not matter, nor does it
matter which 2 of the 3 cases we frame conditions for. This is not always the case,
however, as will be demonstrated in the next program.
Let us write a program that classifies a given pair of coordinates. It has to tell us one
of the following:
1. The point is on the origin
2. The point is on the x-axis
3. The point is on the y-axis
4. The quadrant the point belongs to, otherwise
3. Python Control Structures 65
quadrant.py
1. #!/usr/bin/python
2.
3. # A script to tell whether a given point is on the origin,
4. # on the x- or y-axis, or in a particular quadrant.
5.
6. x = int(input("Enter the x-coordinate of the point: "))
7. y = int(input("Enter the y-coordinate of the point: "))
8.
9. if x == 0 and y == 0:
10. print("The point lies on the origin")
11. elif y == 0:
12. print("The point lies on the x-axis")
13. elif x == 0:
14. print("The point lies on the y-axis")
15. elif x > 0 and y > 0:
16. print("The point lies in the first quadrant")
17. elif x < 0 and y > 0:
18. print("The point lies in the second quadrant")
19. elif x < 0 and y < 0:
20. print("The point lies in the third quadrant")
21. else:
22. print("The point lies in the fourth quadrant")
Output:
Observation:
1. We classify the quadrant to which the given points belongs to only when the
point does not lie on either axes.
2. The given point lies on the y-axis if it’s x-coordinate is 0, and on the x-axis if
it’s y-coordinate is 0. However, if the point lies on the origin (with both
coordinates being 0), we neither say it lies on the x-axis nor on the y-axis.
3. The order of the conditions therefore is important here. If for instance we
66 3. Python Control Structures
decided to first frame a condition for the point lying on the y-axis, this would
be the condition: x==0 and y!=0.
4. The order in which we check for the quadrants still does not matter.
Let us write a program that finds the roots of a quadratic equation ax 2+bx+c=0 using
the formula:
−b±√b2 −4ac
x=
2a
We will also classify the roots of the equation based on the discriminant (b 2-4ac) as
follows:
1. Real and equal, if discriminant is 0
2. Real and distinct, if discriminant > 0
3. Imaginary, if discriminant < 0
quadratic_roots.py
1. #!/usr/bin/python
2.
3. # Script to classify and determine
4. # the roots of a quadratic equation
5.
6. import math
7.
8. a = int(input("Enter the value of a: "))
9. b = int(input("Enter the value of b: "))
10. c = int(input("Enter the value of c: "))
11.
12. discriminant = b**2 - 4*a*c
13.
14. if discriminant == 0:
15. print("The roots are real and equal")
16. x = -b/(2*a)
17. print("The root is",x)
18. elif discriminant > 0:
19. print("The roots are real and distinct")
20. part1 = -b/(2*a)
21. part2 = math.sqrt(discriminant)/(2*a)
22. x1 = part1 + part2
23. x2 = part1 - part2
24. print("The roots are", x1, "and", x2)
25. else:
3. Python Control Structures 67
Output:
Observation:
1. We take the discriminant as:
discriminant=b2 −4ac
−b±√discriminant
x=
2a
−b
x=
2a
3. If the discriminant is positive, the formula can be split into 2 terms as:
−b √discriminant
x= ±
2a 2a
4. If the discriminant is negative, the formula can be split into 2 terms as:
±i √
−b −discriminant
x=
2a 2a
>>> x=5
>>> if x%3 != 0: print(x)
...
5
3. Python Control Structures 69
Method #2:
>>> x=5
>>> if x%3==0: pass
... else: print(x)
...
5
Observation:
1. Both the methods are identical in functionality – different programmers prefer
different styles!
2. The first method is preferred by programmers who want their code to be
compact.
3. The second method is preferred by programmers who like “positive logic”
more than “negative logic” - programmers who find it easier to deal with
equality and true rather than inequality and false!
classify_word.py
1. #!/usr/bin/python
2.
3. # Program that accepts a word and classifies it
4.
5. word = input("Enter a word: ")
6.
7. if word.isalnum():
8. print("Alphanumeric")
9. if word.isalpha():
10. print("Alphabetic")
11. if word.isupper(): print("Uppercase")
12. elif word.islower(): print("Lowercase")
13. elif word.istitle(): print("Titlecase")
14. elif word.isnumeric(): print("Numeric")
15. elif word.isspace(): print ("Whitespace")
Output:
Observation:
1. Refer to section 2.3.4.2 for a description of the functions used in this program.
2. Observe that we have an if statement within the body of an if statement.
The if statement testing whether the characters are uppercase is within the
body of the if statement testing whether the characters are alphabetic, which
in turn is within the body of the if statement testing whether the characters
are alphanumeric.
3. In those cases where the body of an if statement comprised of just a single
print() call, we have used the simple statement syntax of the if
statement. In all other cases, we have used the compound statement form.
4. Just in case you're wondering, there are words that are alphabetic but neither
uppercase nor lowercase nor titlecase. Therefore we cannot use an else
clause instead of the elif in line 13. An example of a word that is neither of
these three is “heLLo”
5. Do observe the mutual exclusion of the conditions involved and why we have
decided to build our conditions in this manner.
3.3 Loops
It is now time to explore the final basic control structure – loops. A loop is defined as a
set of statements executed over and over again.
• A finite loop is one that runs a finite number of times and then passes control
to the statement following it
• An infinite loop is one that never terminates (and therefore might require user
intervention to terminate it along with termination of the script, like pressing
control+C).
• A definite loop is one that runs a fixed number of times (and the number of
times it runs is evident in the script)
• An indefinite loop is one that runs as long as a condition is satisfied and
without knowing the actual data, is impossible to predict exactly how many
times.
We will eventually see examples of all these types.
statement, just like the if statement. Each time the loops runs, it is counted as an
iteration. Thus, a loop typically is used when we want multiple iterations, controlled
using a condition.
The syntax of the while loop is shown below:
while condition:
statements
...
Let us write a program to print all integers from 1 to n, where n is given by the user.
while_demo.py
1. #!/usr/bin/python
2.
3. # Demonstration of while loop
4.
5. n = int(input("Enter a positive integer: "))
6.
7. i=1
8. while i<=n:
9. print(i)
10. i=i+1
Output:
Observation:
1. The loop runs as long as the condition i<=n is True and executes the body
each time the condition evaluated to True.
2. The loop terminates the moment the condition evaluates to False. In this
example, that happens when i>n.
3. In each iteration, we increment the variable i, as shown in line 10.
3. Python Control Structures 73
Let us write a program that tells whether a given number is prime or composite.
prime.py
1. #!/usr/bin/python
2.
3. # Program to classify a given number
4. # as prime or composite
5.
6. n = int(input("Enter a positive integer: "))
7.
8. isPrime=True
9. i=2
10. while i<=n/2:
11. if n%i==0: isPrime=False
12. i=i+1
13.
14. if isPrime: print("Prime")
15. else: print("Composite")
Output:
Observation:
1. A number is prime if it has no other factors apart from 1 and itself. Thus, the
first factor we need to check is 2 and the last factor we need to check is n/2,
since the only factor after that will be n.
2. The number 1 is neither prime nor composite and as such is not expected as
an input for our program.
3. Mathematically, we need not search for factors till n/2, we can search only till
√n, which could significantly bring down the number of iterations required
when a number is prime, but the savings might not be justifiable as finding the
74 3. Python Control Structures
1. #!/usr/bin/python
2.
3. # Program to classify a given number
4. # as prime or composite
5.
6. n = int(input("Enter a positive integer: "))
7.
8. isPrime=True
9. i=2
10. while i<=n/2:
11. if n%i==0:
12. isPrime=False
13. break
14. i=i+1
15.
16. if isPrime: print("Prime")
17. else: print("Composite")
This improved program works just as the previous one and hence it's output is not
shown here.
3. Python Control Structures 75
Observation:
1. The only change in the program is the addition of line 13, thereby changing
the if statement in line 11 from the simple statement syntax to the compound
statement syntax
2. The number of iterations can drop significantly if the number is composite. For
example, if we consider the input to be 100, while prime.py would have
made 49 iterations (from 2 to 50), prime2.py will use only 1 iteration (with
i=2)!
3. If the number is prime, however, the number of iterations will not change! For
example, the prime number 101 will require 49 iterations (2 to 50) in both
prime.py and prime2.py
while condition:
statements
...
else:
statements
We could use the else clause to perform any action in response to the loop condition
evaluating to False. Here is another improvisation over prime.py:
76 3. Python Control Structures
prime3.py
1. #!/usr/bin/python
2.
3. # Program to classify a given number
4. # as prime or composite
5.
6. n = int(input("Enter a positive integer: "))
7.
8. i=2
9. while i<=n/2:
10. if n%i==0:
11. isPrime=False
12. break
13. i=i+1
14. else:
15. isPrime=True
16.
17. if isPrime: print("Prime")
18. else: print("Composite")
Observation:
1. Observe that line 8 of prime.py has been removed! We are no longer
starting with an assumption that the number is prime!
2. Line 14 introduces an else clause which will be executed only if and when the
condition i<=n/2 evaluates to False – and this condition implies that the
number is prime as none of the numbers generated were factors. This
conclusion is recorded in line 15.
In the above syntax, var is the loop control variable (also called loop index variable).
For each item in the sequence, the loop body will execute with var containing the
3. Python Control Structures 77
value of that item. The number of iterations of the loop will be equal to the number of
items in the sequence. Sequences will be covered later in various sections of the book
(since there are different types of sequences), but we will need one right now to show
the working of the for loop. Therefore, before we go on to an example, let us cover the
range() function that generates a sequence of numbers using an arithmetic
progression.
range(end)
range(start,end)
range(start,end,step)
NOTE:
When the syntax range(start, end) is used, the range() function generates
all numbers from start (including start) till end-1 (excludes end).
78 3. Python Control Structures
As always, the ending number is excluded (if at all present in the sequence).
The step size can be negative in order to generate decreasing numbers as shown in
the example below:
prime4.py
1. #!/usr/bin/python
2.
3. # Program to generate all prime numbers till a given
number
4.
5. lim = int(input("Enter the limit: "))
6.
7. for n in range(2,lim+1):
8. i=2
9. while i<=n/2:
10. if n%i==0:
11. break
12. i=i+1
13. else:
14. print(n)
Output:
Observation:
1. Line 7 generates all numbers from 2 till the given number, lim. Note that we
have used lim+1 in the range() function because the range() function
excludes the last value. The body of the loop is more or less the code from
prime3.py.
2. We do not find the need to use the variable isPrime that we had used in
prime3.py as is evident from the code above! This simplifies our code and
eliminates a variable.
3. If we encounter a prime number, we print it as shown in line 14. If we
encounter a composite number, we do nothing but simply continue on with the
next number.
80 3. Python Control Structures
To demonstrate the else clause, we will rewrite prime4.py using only for loops:
prime5.py
1. #!/usr/bin/python
2.
3. # Program to generate all prime numbers till a given
number
4.
5. lim = int(input("Enter the limit: "))
6.
7. for n in range(2,lim+1):
8. for i in range(2,int(n/2)+1):
9. if n%i==0:
10. break
11. else:
12. print(n)
Observation:
1. The program works just like prime4.py and produces identical output. We
have replaced the inner while loop with an inner for loop.
2. The int() function in line 8 is required as the range() function works only
on integers. If n is odd, n/2 will give rise to a fractional (float) value, which
range() does not support. Of course, we can also elect to use the //
operator to guarantee that we get an integer after division.
3. As usual, the “+1” in line 8 is because the range() function excludes the end
value.
3. Python Control Structures 81
1. #!/usr/bin/python
2.
3. # Program to generate all prime numbers till a given
number,
4. # excluding those primes that end with the digit 3.
5.
6. lim = int(input("Enter the limit: "))
7.
8. for n in range(2,lim+1):
9. if n%10 == 3: continue
10. for i in range(2,int(n/2)+1):
11. if n%i==0:
12. break
13. else:
14. print(n)
Output:
Observation:
1. Observe that this program does not print those primes that end with 3 – in this
example, we observe that 3 and 13 are not present in the output despite
being prime because they end with the digit 3.
2. The only change introduced from prime5.py to prime6.py is in line 9.
3. A number that ends with the digit 3 gives a remainder of 3 when divided by 10
– that is the condition used in line 9
82 3. Python Control Structures
4. For all those numbers that end with the digit 3, we continue immediately with
the next iteration. We do not bother to even find out whether the number was
prime or not.
5. Of course, we can also write the program in such a way that we first find out if
the number is prime, and if so then check in line 13 using an elif instead of
the else whether the number ends with the digit 3 or not, and then decide
whether to print it.
1. #!/usr/bin/python
2.
3. # Program to print the sum of digits of a given number
4.
5. n=int(input("Enter an integer: "))
6. sum=0
7.
8. while n>0:
9. digit=n%10
10. sum+=digit
11. n//=10
12.
13. print("The sum is",sum)
Output:
Observation:
1. In line 11, we use the // operator to perform integer division. Furthermore,
//= will integer divide the LHS by the RHS and store the result in the LHS.
2. The program works even for the special case where 0 is the input. The loop
body will not get executed and sum will be printed directly (and sum is
initialized to 0).
1. #!/usr/bin/python
2.
3. # Program to print the reverse of a given number
4.
5. n=int(input("Enter an integer: "))
6. reverse=0
7.
8. while n>0:
9. digit = n%10
10. reverse = reverse*10 + digit
11. n//=10
12.
13. print("The reverse is",reverse)
Output:
Observation:
1. This program is based on sum_digits.py, with sum replaced by reverse
and a change in line 10.
2. As an example, the reverse of the number 789 can be mathematically
obtained as ((9*10)+8)*10+7. That is the logic used in line 10.
palindrome.py
1. #!/usr/bin/python
2.
3. # Program to determine if a given number
4. # is a palindrome or not.
5.
6. n=int(input("Enter an integer: "))
7. num=n
8. reverse=0
9.
10. while n>0:
11. digit = n%10
12. reverse = reverse*10 + digit
13. n//=10
14.
15. if num==reverse: print("The number is a palindrome")
16. else: print("The number is not a palindrome")
Output:
As a final example, let's write a program that will print all 3-digit Armstrong numbers –
numbers which are equal to the sum of the cubes of it's digits. For example,
153=1+125+27.
3. Python Control Structures 85
armstrong.py
1. #!/usr/bin/python
2.
3. # Program to print all 3-digit Armstrong numbers
4.
5. for num in range(100,1000):
6. sum=0
7. n=num
8. while n>0:
9. sum += (n%10)**3
10. n //= 10
11.
12. if num==sum: print(num)
Output:
153
370
371
407
Observation:
1. We use the range() function to generate all 3-digit integers – from 100
(inclusive) to 1000 (exclusive) in line 5.
2. We use the same logic that we had used in sum_digits.py to extract the
digits from the number and find the sum – except that the sum here is the sum
of the cubes of the digits rather than the sum of the digits (line 9).
3. We could have directly used a formula like (n//100)**3 +
(n//10%10)**3 + (n%10)**3 to find the sum of the cubes of the digits of
a given number n. A loop not only increases the readability, but is also general
enough to work with any number of digits, if required.
Observation:
1. We need to import the sys module that contains the exit() function!
2. Ideally, we should pass an exit code as an argument to sys.exit() that can
tell others the reason for our script’s termination! An exit code of 0 is assumed
to mean a normal, successful termination whereas a non-zero exit code is
assumed to mean an error or unsuccessful termination! When an argument is
not passed to sys.exit(), it is assumed to mean a normal, successful
termination, equivalent to sys.exit(0)!
All the approaches above end up terminating the script by raising the SystemExit
exception, technically. This means that as a programmer, you could as well terminate
your script anytime by executing the following statement:
Note:
1. The statement raise SystemExit does not require any modules to be
imported and is technically the way a Python script or interpreter terminates
normally anyway!
2. The most professional way of terminating a Python script is by invoking
sys.exit(), passing an exit code if non-zero.
3.5 Questions
1. Write a short note on all forms of the if statement of Python along with
syntax and examples.
2. Write a short note on all forms of the while statement of Python along with
syntax and examples.
3. Explain the break and continue statements with examples.
4. Write a short note on terminating control that includes the following:
1. Terminating a Loop
2. Terminating a Function
3. Terminating a Block of Code
4. Terminating a Script
5. Write a short note on the range() function with examples.
88 3. Python Control Structures
3.6 Exercises
1. Write a program to tell if a given number is negative, zero or positive.
2. Write a program to print whether a given number is even or odd.
3. Write a program to print all the even and odd numbers (two separate
sequences) up to given a number.
4. Write a program to check if the given character is an alphabet, digit or a
punctuation character.
5. Write a program to check if the given alphabet is a consonant or a vowel.
6. Write a program to count the number vowels and consonants in the given
input string.
7. Write a program to check if the given number is prime or not.
8. Write a program to print all the prime numbers up to a given number.
9. Write a program to check if the given year is a leap year or not.
10. Write a program to check if the given string is a palindrome.
11. Write a program to print factorial of a given number.
12. Write a program to generate the first n terms of the Fibonacci series.
13. Write a program that finds the factors for numbers between 2 and 10. Also
print the numbers that are prime.
14. Write a program to print the grade for a given percentage of marks. The grade
should be A for marks>=80, B for marks>=60, C for marks>=40 and D
otherwise.
15. Write a program to find if the given point lies on the x-axis, y-axis or on the
origin, given it’s coordinates.
16. Write a program to find the distance between the two points. User should
enter the coordinates of the two points.
17. Write a program to print if the given triangle is an equilateral triangle,
isosceles triangle or scalene triangle. User should enter the values for the
sides of the triangle.
3. Python Control Structures 89
SUMMARY
➢ Python does not permit empty blocks – such blocks should have
at least the pass statement.
4 LISTS
Deal with nested lists and write programs to solve problems using
lists.
4. Lists 91
LISTS
A list in Python is defined as an ordered sequence of elements that can be
dynamically altered.
• Ordered means that each item in a list has an index based on it's position in
the list.
• Sequence means that the elements are arranged in order, based on their
indices, and a sequential traversal through the list will give us the elements in
the order of their indices.
• The phrase “dynamically altered” means that each item in the list can be
changed and the list itself can be changed – more items can be added, and
existing items can be replaced or removed.
Lists are dynamically growable and shrinkable – elements can be added and removed
on the fly.
>>> x=int(5)
>>> x
5
>>> L=list([5,2,3])
>>> L
[5, 2, 3]
Since a list is a collection of values, the square brackets ([]) are used to enclose
the collection, and are always used in Python to denote a list. In fact, just as how we
have integer literals whose data type is understood to be int, we have list literals that
use [], whose data type is understood to be list:
92 4. Lists
>>> x=5
>>> x
5
>>> L=[5,2,3]
>>> L
[5, 2, 3]
From now on, we will stick to the [] syntax for convenience and brevity, unless forced
to use the list() function.
NOTE:
While list is a data type, the contents of a list are objects. Thus, there is no
distinction like list of int vs. list of float.
>>> L=[5,2,3]
>>> L[0]
5
>>> L[1]
2
>>> L[2]
3
>>> L[-1]
3
>>> L[-2]
2
>>> L[-3]
5
>>> L=[5,2,3]
>>> L
[5, 2, 3]
>>> L[0]=0
>>> L
4. Lists 93
[0, 2, 3]
>>> L[0]=L[1]+L[2]
>>> L
[5, 2, 3]
>>> L=[5,2,3]
>>> L
[5, 2, 3]
>>> len(L)
3
>>> L=[5]
>>> L
[5]
>>> len(L)
1
>>> L=[]
>>> L
[]
>>> len(L)
0
>>> L=[5,2,3]
>>> for i in L: print(i)
...
5
2
3
>>> L=[5,2,3]
>>> 3 in L
True
>>> 4 in L
False
>>> 4 not in L
True
>>> L=[5,2,3,2]
>>> L.count(3)
1
>>> L.count(2)
2
>>> L.count(4)
0
list.index(x[,i[,j]])
>>> L=[5,2,3,2]
>>> L.index(2)
1
>>> L.index(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 4 is not in list
4. Lists 95
>>> L=[5,2,3,2]
>>> L.index(2,2)
3
In the above example, the first occurrence of 2 is in index 1. If we want to search for
the next occurrence of 2, we need to start searching from index 2 onwards, which is
what the example does.
Form #3: list.index(x,i,j)
If the third argument (j) is given, the search will start at index i but will stop at index j
(excluding index j). Remember that if an element is not found when it stops, it will
throw ValueError.
>>> L=[5,2,3,2]
>>> L.index(3,0,2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 3 is not in list
In the above example, we are searching for the element 3 in the list L starting from
index 0 and restricting our search to index 2 (2 excluded). Thus, our search is
restricted to the indices 0 and 1, and the element 3 is not found in these indices.
>>> L=[5,2,3,2]
>>> L[1:3] # Slice of L from 1 to 3 (exclusive)
[2, 3]
>>> L[1:] # Slice of L from 1 till the end of the list
[2, 3, 2]
>>> L[:3] # Slice of L from the beginning of the list till 3
[5, 2, 3]
>>> L[:] # Slice spanning the entire list
[5, 2, 3, 2]
In case you are of the opinion that L[:] is the same as L, know that they are not and
will be dealt with in detail in section 4.8.3.
Let us now see examples of using list slices to modify the contents of a list (with the
changes underlined for better recognition):
The first example shows how to replace a few elements of a list with a different set of
elements without changing the list size:
>>> L=[5,2,3,2]
>>> L[1:3]=[3,2]
>>> L
[5, 3, 2, 2]
The number of elements assigned to a list slice need not be of the same size! Here is
an example of putting in more number of elements than the number of elements
deleted:
>>> L=[5,2,3,2]
>>> L[1:3]=[3,0,0,2]
>>> L
[5, 3, 0, 0, 2, 2]
Similarly, here is an example of putting in fewer number of elements than the number
of elements deleted:
>>> L=[5,2,3,2]
>>> L[1:3]=[9]
>>> L
[5, 9, 2]
>>> L=[5,2,3,2]
>>> L[1:2]=[9]
>>> L
4. Lists 97
[5, 9, 3, 2]
The above example, though valid, is an overly complicated way of doing a simple
operation of L[1]=9!
>>> L=[5,2,3,2]
>>> L[1:1]=[9,8,7]
>>> L
[5, 9, 8, 7, 2, 3, 2]
The above example shows that it is not necessary to really replace elements –
elements can be merely inserted.
>>> L=[5,2,3,2]
>>> L[1:2]=[]
>>> L
[5, 3, 2]
Similarly, the above example shows that it is not necessary to insert anything –
elements can be simply deleted.
>>> L=[5,2,3,2]
>>> L[2:]=[]
>>> L
[5, 2]
The above example shows how to truncate a list.
>>> L=[5,2,3,2]
>>> L[:2]=[]
>>> L
[3, 2]
The above example similarly removes the initial indices that are less than 2.
>>> L=[5,2,3,2]
>>> L[:]=[]
>>> L
[]
The above example clears the list (deletes all elements from the list). We will cover
more techniques for doing this later on.
98 4. Lists
list.append(x)
>>> L=[5,2,3]
>>> L.append(9)
>>> L
[5, 2, 3, 9]
NOTE:
list.append(x) is similar to list[len(list):]=[x]
list.extend(L)
The list.extend(L) function appends all the elements of the list L to the list list:
>>> L1=[5,2,3]
>>> L2=[7,8,9]
>>> L1.extend(L2)
>>> L1
[5, 2, 3, 7, 8, 9]
NOTE:
list.extend(L) is similar to list[len(list):]=L
4. Lists 99
To add elements at any desired location within the list, the list.insert(i,x)
function can be used, which inserts the element x at index i within the list list:
>>> L=[5,2,3]
>>> L.insert(2,9)
>>> L
[5, 2, 9, 3]
NOTE:
list.insert(i,x) is similar to list[i:i]=[x]
list.append(x) is similar to list.insert(len(list),x)
Note that inserting at an index beyond the end of the list will end up in an insertion at
the end of the list (an append operation):
>>> L=[5,2,3,2]
>>> L.insert(50,9)
>>> L
[5, 2, 3, 2, 9]
>>> L=[5,2,3]
>>> del L[1]
>>> L
[5, 3]
100 4. Lists
The del statement can also be used to delete a range of elements from a list:
>>> L=[5,2,3,7,8,9]
>>> del L[1:4]
>>> L
[5, 8, 9]
>>> L=[5,2,3,7,8,9]
>>> del L[1:5:2]
>>> L
[5, 3, 8, 9]
In the last example where we have used del L[1:5:2], it will delete elements from
index 1 onwards till index 5 (excluding 5) in steps of 2.
As a special case of the above, del can be used to remove all elements of a list,
retaining the list variable and making the list empty:
>>> L=[5,2,3,2]
>>> del L[:]
>>> L
[]
Finally, the del statement can be used to delete the list variable itself:
>>> L=[5,2,3]
>>> del L
>>> L
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'L' is not defined
list.remove(x)
If we wish to delete a specific element regardless of it's position, we can use the
list.remove(x) which searches and removes the first occurrence of x in the list
list. If the element x is not present in the list, it will throw a ValueError:
>>> L=[5,2,3,2]
>>> L.remove(2)
>>> L
[5, 3, 2]
>>> L.remove(2)
>>> L
[5, 3]
4. Lists 101
>>> L.remove(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list
list.pop([i])
If an index (i) is provided, it will delete the element at index i in list list and returns
the element that was deleted. If no index is provided, it will delete the last element in
the list list and returns the element that was deleted:
>>> L=[5,2,3,2]
>>> L.pop(1)
2
>>> L
[5, 3, 2]
>>> L.pop()
2
>>> L
[5, 3]
>>> L.pop(10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: pop index out of range
As can be seen from the last example above, giving an invalid index to pop() results
in an IndexError!
list.clear()
While most of the functions discussed earlier delete one or more elements from a list,
the list.clear() function deletes all elements from the list list and makes it
empty:
102 4. Lists
>>> L=[5,2,3,2]
>>> L.clear()
>>> L
[]
NOTE:
L.clear() is similar to del L[:]
>>> L1=[1,3]
>>> L2=[7,8]
>>> L3=L1+L2
>>> L3
[1, 3, 7, 8]
The += operator will concatenate the list on the RHS to the list on the LHS:
>>> L=[1,3]
>>> L+=[7,8]
>>> L
[1, 3, 7, 8]
NOTE:
L1 += L2 is similar to L1.extend(L2)
>>> L=[5,2,3]*3
>>> L
[5, 2, 3, 5, 2, 3, 5, 2, 3]
NOTE:
list * n is the same as n * list!
Similarly, the *= operator repeats the list on LHS, RHS number of times, and assigns
the resultant list to the LHS variable:
>>> L=[5,2,3]
>>> L*=3
>>> L
[5, 2, 3, 5, 2, 3, 5, 2, 3]
>>> L1=[5,2,3]
>>> L2=L1
>>> L2
[5, 2, 3]
>>> L1[0]=9
>>> L1
[9, 2, 3]
>>> L2
[9, 2, 3]
In order to copy all the elements of a list to another (copying the list object instead of
the list reference), the list.copy() function can be used as follows:
>>> L1=[5,2,3]
>>> L2=L1.copy()
>>> L2
[5, 2, 3]
>>> L1[0]=9
>>> L1
104 4. Lists
[9, 2, 3]
>>> L2
[5, 2, 3]
NOTE:
L2=L1.copy() is similar to L2=L1[:]
Another form of copying is copy list elements to list elements. Here is an example:
>>> [x,y]=[7,8]
>>> x
7
>>> y
8
Each element of the RHS list is copied to the corresponding variable in the LHS list. A
simpler way to achieve this will be covered in section 5.7.3. But do note that the sizes
of the two lists have to be the same for such an assignment to work.
>>> L=[5,2,3]
>>> min(L)
2
>>> L=[5,2,3]
>>> max(L)
5
4. Lists 105
>>> L=[5,2,3]
>>> L.reverse()
>>> L
[3, 2, 5]
>>> L=[5,2,3]
>>> L.sort()
>>> L
[2, 3, 5]
>>> L=[5,2,3]
>>> L.sort(reverse=True)
>>> L
[5, 3, 2]
Observation:
1. L.sort() is the same as L.sort(reverse=False)
2. The list.sort() function changes the order of elements in the original list.
In order to retrieve a sorted list without affecting the original list, the
sorted(list) function can be used, which sorts the elements of the list
list in ascending order and returns the new list without affecting the list
list. In case we wish to sort the elements in descending order, we can use
sorted(list,reverse=True).
3. Sorting a list in reverse order is functionally identical to sorting it and then
reversing it. This, list.sort(reverse=True) is functionally identical to
the sequence of statements list.sort() followed by list.reverse().
countries.py
1. #!/usr/bin/python
2.
3. # Program to store country names and print the names
4. # of those countries whose length is greater than 5.
5.
6. countries=["India","Pakistan","Sri Lanka","China"]
7.
8. for country in countries:
9. if len(country)>5: print(country)
Output:
Pakistan
Sri Lanka
Let us improve the above program and let the user type in the names of as many
countries as required – entering them one by one and typing in “end” when done. We
will then print the names of those countries whose length is greater than 5.
countries2.py
1. #!/usr/bin/python
2.
3. # Program to accept country names and print the names
4. # of those countries whose length is greater than 5.
5.
6. countries=[]
7.
8. while True:
9. country=input("Enter a country name (or 'end' to
terminate): ")
10. if country=="end": break
11. countries.append(country)
12.
13. for country in countries:
14. if len(country)>5: print(country)
Output:
Observation:
1. The program is similar to countries.py, except for the addition of lines 8-11
and initializing the list countries to an empty list in line 6.
2. We use the list.append() function in line 11 to add a new country to the
list countries.
3. We break out of the loop in line 10 if and when we encounter “end” from the
user.
Let us make the program more productive: we will now receive the names of countries
from the user and print the names of their capitals! To do this, we need to have a
“database” of country capitals with us. We can store the names of capitals is a list, say
called capitals. We will also store the names of the corresponding countries in a
list, say countries, in the same order as the capitals stored in capitals. Thus, for a
given index i, countries[i] will tell us the country name while capitals[i] will
tell us the corresponding capital. We will then accept a country name from the user,
search for that country in our list countries using the index function, use the retrieved
index to locate the capital in the capitals list and print out the information. We will
repeat this in a loop till the user enters “end” as in the previous program.
countries3.py
1. #!/usr/bin/python
2.
3. # Program to accept country names and print their capitals
4.
5.
6. countries=["India","Pakistan","Sri Lanka","China"]
7. capitals=["New Delhi","Islamabad","Sri Jayawardenepura
Kotte","Beijing"]
8.
9. while True:
10. country=input("Enter a country name (or 'end' to
terminate): ")
11. if country=="end": break
12. index=countries.index(country)
13. capital=capitals[index]
14. print("The capital of {} is
{}".format(country,capital))
108 4. Lists
Output:
Observation:
1. The above program will report a ValueError and terminate if a country
name that is not present in our list countries is entered. Section 7.7 will
show how we can respond to cases where the country is not listed with us.
2. We will see a better way of implementing this program in section 5.10.
Similarly, we'll now write a program that accepts a single-digit number from the user
and prints it in words. We will store these words in a list and use the digit as an index.
digit2word.py
1. #!/usr/bin/python
2.
3. # Program to accept a single digit and print it in words
4.
5. digit=int(input("Enter a digit: "))
6.
7. words=["zero","one","two","three","four",\
8. "five","six","seven","eight","nine"]
9.
10. print("{} is {}".format(digit,words[digit]))
Output:
Enter a digit: 5
5 is five
Observation:
1. The list defined in line 7 has been split into two lines since it was long, to aid
readability.
2. Any input other than a digit in the range of 0 to 9 will result in an error.
Let us build on the previous example to write a program that converts a number into
words! Thus, 1234 will be printed as one thousand two hundred and thirty
4. Lists 109
1. #!/usr/bin/python
2.
3. # Program to print a number in words
4.
5. onesWords=('zero','one','two','three','four',\
6. 'five','six','seven','eight','nine')
7. tensWords=('','ten','twenty','thirty','forty',\
8. 'fifty','sixty','seventy','eighty','ninety')
9. teensWords=('','eleven','twelve','thirteen','fourteen',\
10. 'fifteen','sixteen','seventeen','eighteen','nineteen')
11.
12. n=int(input("Enter an integer: "))
13.
14. if n==0: print("Zero")
15. else:
16. thousands=n//1000
17. hundreds=n//100%10
18. tens=n//10%10
19. ones=n%10
20.
21. if thousands>0: print("{} thousand
".format(onesWords[thousands]),end='')
22. if hundreds>0: print("{} hundred
".format(onesWords[hundreds]),end='')
23. if (thousands>0 or hundreds>0) and (tens>0 or ones>0):
print("and ",end='')
24. if tens==1 and ones>0: print(teensWords[ones],end='')
25. else:
26. if tens>0: print(tensWords[tens],'',end='')
27. if ones>0: print(onesWords[ones],end='')
28. print()
Observation:
1. the tuple onesWords (line 5) keeps track of the words for single digit
numbers. The tuple tensWords (line 7) keeps track of the words for
describing tens. The tuple teensWords (line 9) keeps track of the words for
11 to 19. These are the words that will be used for output. Additional words
that can be included are “thousand”, “hundred” and “and”, apart from
spaces to separate the words. In these tuples, note that the index 0 is
generally unused, but present nevertheless to ensure that the indices work
correctly.
2. The thousands part, hundreds part, tens part and units part are extracted in
110 4. Lists
lines 16-19.
3. We need to print an “and” between the hundreds and tens part if either a
thousands or hundreds part exists and a tens or units part exists. This is done
in line 23.
4. Since we are printing in stages using multiple print() calls, and the
print() function prints a newline at the end by default which we do not
want, we have used end='' to prevent anything from being printed after
every print() call, and have finally called print() in line 28 to print a
newline.
5. The input 0 is a special case and hence is handled in line 14.
>>> L1=[1,2,3]
>>> L2=[4,5]
>>> L3=[6,7,8,9]
>>> L=[0,L1,L2,L3,100]
>>> L
[0, [1, 2, 3], [4, 5], [6, 7, 8, 9], 100]
Observation:
1. L1, L2 and L3 are lists
2. L is a list containing 5 elements, including L1, L2 and L3. This makes L an
outer list with L1, L2 and L3 being inner lists.
Let us continue with the above illustration and prove that indeed the outer list is
considered to have 5 elements, 3 of which are lists:
>>> L1=[1,2,3]
>>> L2=[4,5]
>>> L3=[6,7,8,9]
>>> L=[0,L1,L2,L3,100]
4. Lists 111
>>> L
[0, [1, 2, 3], [4, 5], [6, 7, 8, 9], 100]
>>> len(L)
5
>>> type(L)
<class 'list'>
>>> type(L[0])
<class 'int'>
>>> type(L[1])
<class 'list'>
Observation:
1. We can verify that len(L) gives us 5 – all elements are counted equal
irrespective of their type.
2. We can verify that the type of L[0], the first element of the outer list, which is
an integer, is class 'int'. However, the type of L[1], the second element
of the outer list, which is a list itself, is class 'list'.
Just as how we can index elements of the outer list, we can also index elements of
the inner list as well, as shown in the example below:
>>> L1=[1,2,3]
>>> L2=[4,5]
>>> L3=[6,7,8,9]
>>> L=[0,L1,L2,L3,100]
>>> L[3]
[6, 7, 8, 9]
>>> L[3][1]
7
Observation:
1. L[3] refers to the index 3 (fourth element) of the outer list, which is an inner
list with contents [6,7,8,9].
2. Index 1 (second element) of this inner list is 7.
3. Thus L[3] is our list L3 and L[3][1] is the same as L3[1], which is 7.
We can nest lists within lists to any depth depending on our requirement! This is
illustrated below:
112 4. Lists
>>> L=[4,[6,3,[7,2,5,[7,7,6,[4,1,0]],8,4],9],2]
>>> L
[4, [6, 3, [7, 2, 5, [7, 7, 6, [4, 1, 0]], 8, 4], 9], 2]
>>> L[1]
[6, 3, [7, 2, 5, [7, 7, 6, [4, 1, 0]], 8, 4], 9]
>>> L[1][2]
[7, 2, 5, [7, 7, 6, [4, 1, 0]], 8, 4]
>>> L[1][2][3]
[7, 7, 6, [4, 1, 0]]
>>> L[1][2][3][3]
[4, 1, 0]
>>> L[1][2][3][3][0]
4
4.12 Questions
1. Define a list in Python.
2. Explain the different ways of searching an element within a list.
3. Write a short note on list slices.
4. Differentiate between the append() and extend() methods of list with
examples.
5. Differentiate between del and pop() on a list with examples.
6. Differentiate between the remove() and pop() methods on lists with
examples.
7. Write a short note on operators that can be used on lists.
8. How is list assignment different from the list copy() method? Explain with
examples.
9. Write a short note on sorting of lists.
4.13 Exercises
1. Write a program to add all the items in a list.
2. Write a program to get the largest and smallest number in a list.
3. Write a program to remove duplicates from a list.
4. Write a program that prints whether at least one member of the given two lists
are same.
5. Write a program to convert a list of characters into a string.
6. Write a program to check whether two lists are circularly identical.
4. Lists 113
SUMMARY
➢ The len() function gives the number of elements in a list and the
count() method returns the number of occurrences of a
particular element in the list.
SUMMARY
➢ The min() and max() functions return the smallest and largest
elements in a list, respectively.
5 TUPLES
TUPLES
A tuple in Python is defined as an immutable ordered sequence of elements.
• Immutable means that the contents of a tuple cannot be changed once
created.
• Ordered means that each item in a tuple has an index based on it's position in
the tuple.
• Sequence means that the elements are arranged in order, based on their
indices, and a sequential traversal through the tuple will give us the elements
in the order of their indices.
A tuple is very closely related to a list, and hence it might be helpful to compare them.
We will do this in section 5.9 after having covered various properties of tuples, but for
now, do remember a very important distinguishing property of tuples: they are
immutable!
>>> T=tuple([5,2,3])
>>> T
(5, 2, 3)
NOTE:
The tuple() function accepts an iterable object – an object over which it can
iterate and fetch values one by one. Lists are examples of iterables, and there are
many other iterable types.
>>> T=tuple()
>>> T
()
118 5. Tuples
>>> T=(5,2,3)
>>> T
(5, 2, 3)
From now on, we will stick to the () syntax for convenience and brevity.
The parentheses are used primarily for readability, and it is the presence of the
comma that actually drives Python into concluding that we are dealing with a tuple! It
is also possible to leave out the parentheses if required:
>>> T=5,2,3
>>> T
(5, 2, 3)
NOTE:
While lists are generally homogeneous, tuples are generally heterogeneous. Of
course, Python does not prohibit us from having heterogeneous lists and
homogeneous tuples!
>>> T=()
>>> T
()
>>> T=5,
>>> T
(5,)
>>> T=(5,)
>>> T
(5,)
>>> T=(5,2,3)
>>> T[0]
5
>>> T[1]
2
>>> T[2]
3
>>> T[-1]
3
>>> T[-2]
2
>>> T[-3]
5
>>> T=(5,2,3)
>>> T
(5, 2, 3)
>>> T[0]=9
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> T=(5,2,3)
>>> T
(5, 2, 3)
>>> len(T)
3
120 5. Tuples
>>> T=(5,)
>>> len(T)
1
>>> T=()
>>> len(T)
0
>>> T=(5,2,3)
>>> for i in T: print(i)
...
5
2
3
>>> T=(5,2,3)
>>> 3 in T
True
>>> 4 in T
False
>>> 4 not in T
True
>>> T=(5,2,3,2)
>>> T.count(3)
1
>>> T.count(2)
5. Tuples 121
2
>>> T.count(4)
0
tuple.index(x[,i[,j]])
>>> T=(5,2,3,2)
>>> T.index(2)
1
>>> T.index(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: tuple.index(x): x not in tuple
>>> T=(5,2,3,2)
>>> T.index(2,2)
3
In the above example, the first occurrence of 2 is in index 1. If we want to search for
the next occurrence of 2, we need to start searching from index 2 onwards, which is
what the example does.
Form #3: tuple.index(x,i,j)
If the third argument (j) is given, the search will start at index i and stop at index j
(excluding index j). Remember that if an element is not found when it stops, it will
throw ValueError.
122 5. Tuples
>>> T=(5,2,3,2)
>>> T.index(3,0,2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: tuple.index(x): x not in tuple
In the above example, we are searching for the element 3 in the tuple T starting from
index 0 and restricting our search to index 2 (2 excluded). Thus, our search is
restricted to the indices 0 and 1, and the element 3 is not found in these indices.
>>> T=(5,2,3,2)
>>> T[1:3] # Slice of T from 1 to 3 (exclusive)
(2, 3)
>>> T[1:] # Slice of T from 1 till the end of the tuple
(2, 3, 2)
>>> T[:3] # Slice of T from the beginning of the tuple till 3
(5, 2, 3)
>>> T[:] # Slice spanning the entire tuple
(5, 2, 3, 2)
NOTE:
Since tuples are immutable, unlike lists, T[:] is the same as T!
>>> T1=(1,3)
>>> T2=(7,8)
>>> T3=T1+T2
>>> T3
(1, 3, 7, 8)
The += operator will concatenate the tuple on the RHS to the list on the LHS:
>>> T=(1,3)
>>> T+=(7,8)
>>> T
(1, 3, 7, 8)
NOTE:
T1 += T2 creates a new tuple and stores T1+T2 result in that and assigns the
new tuple to T1. Note that tuples are immutable and new elements cannot be
added to an existing tuple.
>>> T=(5,2,3)*3
>>> T
(5, 2, 3, 5, 2, 3, 5, 2, 3)
NOTE:
tuple * n is the same as n * tuple!
Similarly, the *= operator repeats the tuple on LHS, RHS number of times, and
assigns the resultant tuple to the LHS variable:
>>> T=(5,2,3)
>>> T*=3
>>> T
(5, 2, 3, 5, 2, 3, 5, 2, 3)
124 5. Tuples
>>> T1=(5,[],3)
>>> T2=T1
>>> T2
(5, [], 3)
>>> T1[1].append(9)
>>> T1
(5, [9], 3)
>>> T2
(5, [9], 3)
Since tuples are immutable, we had to insert a list as a member in the tuple and
change the contents of the list instead. Since tuples are immutable, most methods to
copy tuples end up copying only it's reference!
Tuples can be assigned to tuples – a tuple of values can be assigned to a tuple of
variables. What makes the syntax even more simpler here is the fact that it is the
comma that denotes we are dealing with a tuple, not the parentheses, and hence the
parentheses can be omitted. Have a look at these example of multiple assignment:
>>> x = y = 5
>>> x
5
>>> y
5
>>> x,y = 2,3
>>> x
2
>>> y
3
5. Tuples 125
>>> T=(5,2,3)
>>> min(T)
2
>>> T=(5,2,3)
>>> max(T)
5
126 5. Tuples
>>> T=(5,2,3)
>>> sorted(T)
[2, 3, 5]
>>> T=(5,2,3)
>>> sorted(T,reverse=True)
[5, 3, 2]
>>> L=[5,2,3]
>>> T=tuple(L)
>>> T
(5, 2, 3)
>>> L=list(T)
>>> L
[5, 2, 3]
Both of them support a very similar set of operations as far as searching for an
element within them are concerned.
The biggest difference between them is the fact that tuples are immutable and lists are
dynamically alterable! This single difference perhaps makes the strongest distinction
between lists and tuples and can help you decide what to use when!
If you want a list whose contents are “frozen” (known at the time of creation and do
not change after that), switch over to a tuple. If you want a tuple whose contents are
manipulated later in the code, switch over to a list.
5. Tuples 127
Also, it might be helpful to keep in mind that lists are generally homogeneous whereas
tuples are generally heterogeneous. If you find yourself dealing with a heterogeneous
list, maybe you wanted a tuple there instead. Do note however that Python does not
enforce these and has no problems dealing with a heterogeneous list or
homogeneous tuple, per se.
1. #!/usr/bin/python
2.
3. # Program to accept country names and print their capitals
4.
5.
6. countries=("India","Pakistan","Sri Lanka","China")
7. capitals=("New Delhi","Islamabad","Sri Jayawardenepura
Kotte","Beijing")
8.
9. while True:
10. country=input("Enter a country name (or 'end' to
terminate): ")
11. if country=="end": break
12. index=countries.index(country)
13. capital=capitals[index]
14. print("The capital of {} is
{}".format(country,capital))
128 5. Tuples
Observation:
1. The program is identical to countries3.py, except that in lines 6 and 7, we
have used () instead of []. In other words, we are dealing with tuples
instead of lists.
2. Tuples have better efficiency than lists. Therefore, if you find a list that could
well exist as a tuple instead, switch over to tuple.
Next, we'll write a program to generate the first 'n' terms of the Fibonacci sequence –
a sequence of numbers where each number is the sum of the previous two numbers
in the series.
fib.py
1. #!/usr/bin/python
2.
3. # Program to generate the first 'n' Fibonacci terms
4.
5. t1 = t2 = 1
6.
7. n = int(input("How many terms? "))
8.
9. for i in range(n):
10. print(t1)
11. t1,t2 = t2,t1+t2
Observation:
1. The first 2 terms are 1 and 1 (some people consider the first 2 terms to be 0
and 1). This is achieved with a multiple assignment in line 5. (In case you
wish to start the series from 0, then line 5 can be replaced by a tuple
assignment: t1,t2=0,1.
2. Within the loop that runs n times (line 9), we print one term per iteration (line
10). The series moves on because of the tuple assignment in line 11, where
t1=t2 and t2=t1+t2.
5.11 Questions
1. How are tuples different from lists? When do we prefer a list over a tuple and
vice-versa?
2. Write a short note on singleton tuples.
3. Write a short note on searching within tuples.
4. Write a short note on tuple slices.
5. Tuples 129
5. Write a short note on the operators that can be used with tuples.
5.12 Exercises
1. Write a program to convert a tuple of values into a tuple of singleton tuples of
values. Thus, (1,2,3) should get converted to ((1,)(2,),(3,)).
2. Write a program to count the number of elements in a list until it finds a tuple
in the list.
3. Write a program to add corresponding elements of two given tuples, giving
rise to a new tuple.
4. Write a program to multiply each element of a given tuple with a given integer,
giving rise to a new tuple.
5. Write a program that classifies a triangle as being equilateral, isosceles or
scalene, given it’s 3 vertices as a tuple of tuples.
130 5. Tuples
SUMMARY
6 SETS
SETS
A set is an unordered collection of unique elements.
• Unordered means that elements in a set do not have an index by which they
can be addressed.
• Unique means that duplicates are not tolerated (attempt to add an element
into a set when the element already exists will be ignored).
NOTE:
A set can contain only hashable elements – those elements that have a hash code.
Though hashing is outside the scope of this book, do note that all immutable
collections (like tuples) are hashable in Python whereas mutable collections (like
lists) are not. Thus, for example, a set can contain a tuple, but can't contain a list!
>>> S=set([1,2,3,4,5])
>>> S
{1, 2, 3, 4, 5}
>>> S=set(range(1,6))
>>> S
{1, 2, 3, 4, 5}
NOTE:
The set constructor receives a sequence (like list and tuple) and extracts their
elements to create a set. Do not mistake the above example to mean construction
of a set that contains a list! Remember from the previous note that sets cannot
contain lists as they are mutable!
Just as how [] is used to implicitly define lists, {} is used to implicitly define sets:
>>> S={2,4,6,8}
>>> S
{8, 2, 4, 6}
As can be seen from the example above, the order of the elements within a set is not
under our control as sets are unordered collections. Sets support the mathematical
set operations of finding the union, intersection and difference. We will explore these
operations soon.
6. Sets 133
>>> S=set(range(1,6))
>>> S
{1, 2, 3, 4, 5}
>>> len(S)
5
>>> S=set()
>>> S
set()
>>> len(S)
0
>>> S=set(range(1,6))
>>> for i in S: print(i)
...
1
2
3
4
5
>>> S=set(range(1,6))
>>> 2 in S
True
>>> 9 in S
False
>>> 9 not in S
True
>>> S=set(range(1,6))
>>> S
{1, 2, 3, 4, 5}
>>> S.add(9)
>>> S
{1, 2, 3, 4, 5, 9}
>>> S1={2,4,6,8}
>>> S1
{8, 2, 4, 6}
>>> S2={1,3,5,7}
>>> S2
{1, 3, 5, 7}
>>> S1.update(S2)
>>> S1
{1, 2, 3, 4, 5, 6, 7, 8}
It is also possible to find the union of a set with another and then assign that set to the
original variable. We will examine how to find the union soon.
6. Sets 135
>>> S=set(range(1,6))
>>> S
{1, 2, 3, 4, 5}
>>> S.pop()
1
>>> S
{2, 3, 4, 5}
>>> S.pop()
2
>>> S
{3, 4, 5}
>>> S=set(range(1,6))
>>> S
{1, 2, 3, 4, 5}
>>> S.remove(3)
>>> S
{1, 2, 4, 5}
>>> S.remove(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 3
>>> S=set(range(1,6))
>>> S
{1, 2, 3, 4, 5}
>>> S.discard(3)
>>> S
{1, 2, 4, 5}
>>> S.discard(3)
>>> S
{1, 2, 4, 5}
>>> S=set(range(1,6))
>>> S
{1, 2, 3, 4, 5}
>>> S.clear()
>>> S
set()
>>> S1={1,2}
>>> S2=S1
>>> S2
{1, 2}
>>> S1.remove(1)
>>> S1
{2}
>>> S2
{2}
In order to copy the elements of a set to another, the set.copy() function can be
used:
>>> S1={1,2}
>>> S2=S1.copy()
>>> S2
{1, 2}
>>> S1.remove(1)
>>> S1
{2}
>>> S2
{1, 2}
6. Sets 137
>>> S1={1,2}
>>> S2={2,3}
>>> S1.union(S2)
{1, 2, 3}
>>> S1
{1,2}
>>> S1={1,2}
>>> S2={2,3}
>>> S1 | S2
{1, 2, 3}
>>> S1
{1, 2}
138 6. Sets
>>> {1,2}.union({2,3})
{1, 2, 3}
>>> {1,2} | {2,3}
{1, 2, 3}
It is also possible to find the union of multiple sets by passing multiple arguments to
union() or using the | operator multiple times as follows:
>>> {1}.union({2},{3},{4})
{1, 2, 3, 4}
>>> {1} | {2} | {3} | {4}
{1, 2, 3, 4}
The set.update(s) function also finds the union, but stores it back into the original
set set (this was introduced in section 6.6.1.2):
>>> S1={1,2}
>>> S2={2,3}
>>> S1.update(S2)
>>> S1
{1, 2, 3}
>>> S1={1,2}
>>> S2={2,3}
>>> S1 |= S2
>>> S1
{1, 2, 3}
It is also possible to find the union of multiple sets and store them in a set by passing
multiple arguments to update() or using the | operator multiple times after using the
|= operator as follows:
>>> S={1}
>>> S.update({2},{3},{4})
>>> S
{1, 2, 3, 4}
>>> S={1}
>>> S |= {2} | {3} | {4}
>>> S
{1, 2, 3, 4}
6. Sets 139
>>> S1={1,2}
>>> S2={2,3}
>>> S1.intersection(S2)
{2}
>>> S1
{1, 2}
>>> S1={1,2}
>>> S2={2,3}
>>> S1 & S2
{2}
>>> S1
{1, 2}
>>> {1,2}.intersection({2,3})
{2}
>>> {1,2} & {2,3}
{2}
>>>> {1,2,3}.intersection({2,3,4},{1,3,5})
{3}
>>> {1,2,3} & {2,3,4} & {1,3,5}
{3}
>>> S1={1,2}
>>> S2={2,3}
>>> S1.intersection_update(S2)
>>> S1
{2}
140 6. Sets
>>> S1={1,2}
>>> S2={2,3}
>>> S1 &= S2
>>> S1
{2}
It is also possible to find the intersection of multiple sets and store them in a set by
passing multiple arguments to intersection_update() or using the & operator
multiple times after using the &= operator as follows:
>>> S={1,2,3}
>>> S.intersection_update({2,3,4},{1,3,5})
>>> S
{3}
>>> S={1,2,3}
>>> S &= {2,3,4} & {1,3,5}
>>> S
{3}
>>> S1={1,2}
>>> S2={2,3}
>>> S1.difference(S2)
{1}
>>> S1
{1, 2}
>>> S1={1,2}
>>> S2={2,3}
>>> S1 - S2
{1}
>>> S1
{1, 2}
6. Sets 141
>>> {1,2}.difference({2,3})
{1}
>>> {1,2} - {2,3}
{1}
It is also possible to find the difference of multiple sets by passing multiple arguments
to difference() or using the - operator multiple times as follows:
>>> {1,2,3}.difference({2,4,6},{3,6,9})
{1}
>>> {1,2,3} - {2,4,6} - {3,6,9}
{1}
>>> S1={1,2}
>>> S2={2,3}
>>> S1.difference_update(S2)
>>> S1
{1}
>>> S1={1,2}
>>> S2={2,3}
>>> S1 -= S2
>>> S1
{1}
It is also possible to find the difference of a set and the union of multiple sets and
store them in a set by passing multiple arguments to difference_update() or
using the | operator multiple times after using the -= operator as follows:
>>> S={1,2,3}
>>> S.difference_update({2,4,6},{3,6,9})
>>> S
{1}
>>> S={1,2,3}
>>> S -= {2,4,6} | {3,6,9}
>>> S
{1}
142 6. Sets
>>> S1={1,2}
>>> S2={2,3}
>>> S1.symmetric_difference(S2)
{1, 3}
>>> S1
{1, 2}
>>> S1={1,2}
>>> S2={2,3}
>>> S1 ^ S2
{1, 3}
>>> S1
{1, 2}
>>> {1,2}.symmetric_difference({2,3})
{1, 3}
>>> {1,2} ^ {2,3}
{1, 3}
>>> S1={1,2}
>>> S2={2,3}
>>> S1.symmetric_difference_update(S2)
>>> S1
{1, 3}
>>> S1={1,2}
>>> S2={2,3}
>>> S1 ^= S2
>>> S1
{1, 3}
6. Sets 143
Difference - difference()
>>> S1={1,2}
>>> S2={2,3}
>>> S1.isdisjoint(S2)
False
>>> S1.isdisjoint({3,4})
True
144 6. Sets
>>> {1,2}.issubset({2,3})
False
>>> {1,2}.issubset({1,2,3})
True
>>> {1,2} <= {2,3}
False
>>> {1,2} <= {1,2,3}
True
The set.issuperset(s) function tells whether the set set is a superset of the set
s or not. The >= operator plays the same role.
>>> {1,2}.issuperset({2,3})
False
>>> {1,2,3}.issuperset({2,3})
True
>>> {1,2} >= {2,3}
False
>>> {1,2,3} >= {2,3}
True
1. #!/usr/bin/python
2.
3. # Demonstration of sets
4.
5. item_count_A = int(input("How many items does A have? "))
6. items_A = set()
7. for i in range(item_count_A):
8. item = input("Enter item #{} of A: ".format(i+1))
9. items_A.add(item)
10.
11. item_count_B = int(input("How many items does B have? "))
12. items_B = set()
13. for i in range(item_count_B):
14. item = input("Enter item #{} of B: ".format(i+1))
15. items_B.add(item)
16.
17. print("Number of items A has:",len(items_A))
18. print("Number of items B has:",len(items_B))
19.
20. print("All items with A and B:")
21. for item in items_A | items_B: print(item)
22.
23. print("Items common to A and B:")
24. for item in items_A & items_B: print(item)
25.
26. print("Items unique to A:")
27. for item in items_A - items_B: print(item)
28.
29. print("Items unique to B:")
30. for item in items_B - items_A: print(item)
31.
32. print("Items with only A and only B:")
33. for item in items_A ^ items_B: print(item)
146 6. Sets
Output:
6.9 Questions
1. What is a Set?
2. Define the following with respect to given 2 sets with examples:
1. Set Union
2. Set Intersection
3. Set Difference
4. Set Symmetric Difference
5. Disjoint Sets
6. Subset
6. Sets 147
7. Superset
3. In which all ways can we create a set? Illustrate with simple examples.
4. Is it possible to access a specific element in a set using its index?
Explain.
5. What do you observe when you declare a set with some elements and try
displaying its contents? What do you infer from this?
6. Can a set contain a list? What kind of collection is a set capable of
storing?
7. How do we determine the size of a set?
8. How can we add multiple elements from a list to a set? Demonstrate with
an example.
9. Illustrate with examples how to remove elements from a set using the
below:
1. pop()
2. remove()
3. discard()
10. What do you observe when you perform the below operations on an
empty set?
1. pop()
2. remove()
3. discard()
11. How can we empty a set?
12. List all possible ways of finding:
1. Union of 2 sets with examples.
2. Intersection of 2 sets with examples.
3. Difference of 2 sets with examples
13. Illustrate with a simple example how to evaluate the symmetric difference
of 2 sets using 'symmetric_difference' and '^' operator?
14. In what way does the 'symmetric_difference_update' function differ
from ‘symmetric_difference’? Illustrate with an example.
15. In what way does the 'difference' function differ from
‘symmetric_difference’?
16. Given 2 sets A & B, how do we confirm if the below are true:
1. A and B are disjoint sets
148 6. Sets
2. A is a subset of B
3. A is a superset of B
6.10 Exercises
1. Add a considerable huge no. of unique numbers (e.g. 10000) in a set and use
the 'in' operator to search for an element and note the time it takes to fetch.
Repeat the above task and use a for loop to iterate through the set in search
of the element and then compare the time taken in both scenarios.
2. Write a program to convert a list of integers into a list of unique integers using
sets.
3. Assume that we have 2 lists, each having only IP addresses as its contents.
Write a program to determine if the 2 lists have the same IP addresses (albeit
in different positions) in the most optimal manner.
6. Sets 149
SUMMARY
7 DICTIONARIES
DICTIONARIES
A dictionary is a collection of key-value pairs subject to the constraint that all the keys
should be unique.
The keys of a dictionary can be considered to be members of a set (and are thus
unique), with each key keeping track of a value.
>>> D=dict([('apple','red'),('grapes','green')])
>>> D
{'grapes': 'green', 'apple': 'red'}
Observation:
1. A dictionary can be created in a variety of ways – the above technique is one
way by which an interable (where each element of the iterable has 2 elements
– a key and a value) can be converted into a dictionary.
2. In the above example, the iterable is a list and each element of the list is a
tuple with 2 elements – a key and a value.
3. Since the keys of a dictionary can be considered to be stored in a set and
sets are unordered collections, even the keys of a dictionary are unordered.
As can be observed from the output above, the order of the keys is not
directly under our control.
Just as how [] is used to indicate list literals, {} are used to denote dictionary literals.
Recall from section 6.1 that {} are also used to enclose members of a set. How does
Python differentiate between sets and dictionaries then? Using the fact that a set
comprises of multiple elements within {} separated by commas whereas a dictionary
comprises of multiple elements within {} separated by colons (to separate the key
from the value) and commas (to separate the pairs).
>>> D={'apple':'red','grapes':'green'}
>>> D
{'grapes': 'green', 'apple': 'red'}
From now on, we will stick to the {} syntax for convenience and brevity.
152 7. Dictionaries
NOTE:
Since braces ({}) with commas (,) denote sets and braces with colons (:) denote
dictionaries, the question might arise what does empty braces mean? The answer
is that empty braces always indicate empty dictionaries and therefore the empty
sets have to be created only by using set().
>>> D={'apple':'red','grapes':'green'}
>>> D['apple']
'red'
>>> D['grapes']
'green'
An attempt to access a value using a key that does not exist in the dictionary results in
a KeyError:
>>> D={'apple':'red','grapes':'green'}
>>> D['mango']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'mango'
New values can be assigned to existing keys using an assignment as shown below.
Also, if at the time of assignment the referenced key does not exist, it will be created:
>>> D={'apple':'red','grapes':'green'}
>>> D['grapes']='purple'
>>> D
{'grapes': 'purple', 'apple': 'red'}
>>> D['mango']='yellow'
>>> D
{'mango': 'yellow', 'grapes': 'purple', 'apple': 'red'}
7. Dictionaries 153
>>> D={'apple':'red','grapes':'green'}
>>> len(D)
2
>>> D={'apple':'red'}
>>> len(D)
1
>>> D={}
>>> len(D)
0
>>> D={'apple':'red','grapes':'green'}
>>> for K in D: print(K)
...
grapes
apple
The keys of a dictionary can be explicitly retrieved using the dict.keys() function,
which returns a new view of the keys within the dictionary dict – which means that if
and when the keys of the dictionary dict changes, the view will also change
automatically:
>>> D={'apple':'red','grapes':'green'}
>>> K=D.keys()
>>> for i in K: print(i)
...
grapes
apple
154 7. Dictionaries
>>> D={'apple':'red','grapes':'green'}
>>> V=D.values()
>>> for i in V: print(i)
...
green
red
NOTE:
The order in which the values are returned is the same as the order in which their
keys are returned.
Of course, it is also possible to iterate through the keys and print out the
corresponding values:
>>> D={'apple':'red','grapes':'green'}
>>> for K in D: print(D[K])
...
green
red
Observation:
1. In the above example, we are iterating through the keys of the dictionary D.
2. We are printing D[K], which is the value associated with the key K in the
dictionary D.
>>> D={'apple':'red','grapes':'green'}
>>> for K,V in D.items():
... print(K,V)
...
apple red
grapes green
7. Dictionaries 155
>>> D={'apple':'red','grapes':'green'}
>>> 'apple' in D
True
>>> 'mango' in D
False
>>> 'mango' not in D
True
>>> D={'apple':'red','grapes':'green'}
>>> K='apple'
>>> V=D[K]
>>> print(V)
red
>>> D={'apple':'red','grapes':'green'}
>>> D.get('apple')
'red'
>>> D.get('mango')
>>>
156 7. Dictionaries
Observation:
1. D.get('mango') is supposed to return the value of the key mango in the
dictionary D, but since it does not exist it returns None instead. Therefore,
there is no output at all.
The complete version is dict.get(key,default) function, which returns the value
corresponding to the key key in the dictionary dict if the key exists, and returns the
value default if the key was not found.
>>> D={'apple':'red','grapes':'green'}
>>> D.get('apple','blue')
'red'
>>> D.get('mango','blue')
'blue'
>>> D.get('mango')
>>>
Observation:
1. D.get('apple','blue') returns the value corresponding to the key
apple in the dictionary D. The corresponding value is red and is hence
returned.
2. D.get('mango','blue') tries to return the value of the key mango in the
dictionary D, but since that key does not exist it returns the value blue
instead.
3. D.get('mango') is supposed to return the value of the key mango in the
dictionary D, but since it does not exist it returns None instead. Therefore,
there is no output at all.
Given a value, the following example shows how to fetch any one key with the given
value:
>>>
D={'apple':'red','grapes':'green','banana':'yellow','mango':'green'}
>>> V='green'
>>> for K in D:
... if D[K]==V:
... print(K)
... break
...
mango
Observation:
1. The break statement is just an optimization mechanism to ensure that once
we find any suitable key, we abort the search process
2. If the break statement is removed and no other changes are made to the
code, the code ends up listing all the keys that have the given value.
>>>
D={'apple':'red','grapes':'green','banana':'yellow','mango':'green'}
>>> V='green'
>>> for K in D:
... if D[K]==V: print(K)
...
mango
grapes
>>> D={'apple':'red','grapes':'green'}
>>> 'mango' in D
False
>>> D['mango']='yellow'
>>> 'mango' in D
True
>>> D={'apple':'red','grapes':'green'}
>>> D.setdefault('apple')
'red'
>>> D.setdefault('pear')
>>> D
{'mango': 'blue', 'apple': 'red', 'pear': None, 'grapes': 'green'}
In the more complete form, the dict.setdefault(key,default) function returns
the value associated with the key key in the dictionary dict if the key exists, and if
the key does not exist, creates it with a corresponding value of default and returns
the same value.
>>> D={'apple':'red','grapes':'green'}
>>> D.setdefault('apple','blue')
'red'
>>> D
{'apple': 'red', 'grapes': 'green'}
>>> D.setdefault('mango','blue')
'blue'
>>> D
{'mango': 'blue', 'apple': 'red', 'grapes': 'green'}
>>> D={'apple':'red','grapes':'green'}
>>> del D['apple']
>>> D
{'grapes': 'green'}
>>> del D['mango']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'mango'
>>> D={'apple':'red','grapes':'green'}
>>> D.popitem()
('apple', 'red')
>>> D
{'grapes': 'green'}
>>> D.popitem()
('grapes', 'green')
>>> D
{}
>>> D.popitem()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'popitem(): dictionary is empty'
>>> D={'apple':'red','grapes':'green'}
>>> D.pop('apple')
'red'
>>> D
{'grapes': 'green'}
>>> D.pop('grapes')
'green'
>>> D
{}
>>> D.pop('mango')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'mango'
160 7. Dictionaries
>>> D={'apple':'red','grapes':'green'}
>>> D.pop('apple','blue')
'red'
>>> D
{'grapes': 'green'}
>>> D.pop('apple','blue')
'blue'
>>> D
{'grapes': 'green'}
>>> D={'apple':'red','grapes':'green'}
>>> D.clear()
>>> D
{}
countries5.py
1. #!/usr/bin/python
2.
3. # Program to accept country names and print their capitals
4.
5.
6. capitals={"India":"New Delhi","Pakistan":"Islamabad","Sri
Lanka":"Sri Jayawardenepura Kotte","China":"Beijing"}
7.
8. while True:
9. country=input("Enter a country name (or 'end' to
terminate): ")
10. if country=="end": break
11. print("The capital of {} is {}" .format(country,
capitals[country]))
Observation:
1. This is the most efficient implementation so far of this program – fetching a
value from a dictionary using it's key is a very fast operation.
2. The program will still report a KeyError if an unknown country is entered.
This is because a KeyError is generated if we attempt to access the value of
a key that does not exist in a dictionary. We can either use the dict.get()
and check if the returned value was None or can use exception handling
(discussed in section 13).
Let us improve the previous program by ensuring that the program does not crash
even if an unknown country name is given by the user. The program should merely
say that the country is not found in it's database!
162 7. Dictionaries
countries6.py
1. #!/usr/bin/python
2.
3. # Program to accept country names and print their capitals
4.
5.
6. capitals={"India":"New Delhi","Pakistan":"Islamabad","Sri
Lanka":"Sri Jayawardenepura Kotte","China":"Beijing"}
7.
8. while True:
9. country=input("Enter a country name (or 'end' to
terminate): ")
10. if country=="end": break
11. capital=capitals.get(country)
12. if capital is None: print("Sorry, country not found!")
13. else: print("The capital of {} is
{}".format(country,capital))
Output:
roman2dec.py
1. #!/usr/bin/python
2.
3. # Program to convert a single Roman numeral to decimal and
words
4.
5. roman={'I':1,'V':5,'X':10,'L':50,'C':100,'D':500,'M':1000}
6.
7. n=input("Enter a single Roman uppercase numeral:
").upper()
8. n=roman[n]
9. print("Decimal:",n)
10.
11. onesWords=('zero','one','two','three','four',\
12. 'five','six','seven','eight','nine')
13. tensWords=('','ten','twenty','thirty','forty',\
14. 'fifty','sixty','seventy','eighty','ninety')
15. teensWords=('','eleven','twelve','thirteen','fourteen',\
16. 'fifteen','sixteen','seventeen','eighteen','nineteen')
17.
18. thousands=n//1000
19. hundreds=n//100%10
20. tens=n//10%10
21. ones=n%10
22.
23. if thousands>0: print("{} thousand
".format(onesWords[thousands]),end='')
24. if hundreds>0: print("{} hundred
".format(onesWords[hundreds]),end='')
25. if (thousands>0 or hundreds>0) and (tens>0 or ones>0):
print("and ",end='')
26. if tens==1 and ones>0: print(teensWords[ones],end='')
27. else:
28. if tens>0: print(tensWords[tens],'',end='')
29. if ones>0: print(onesWords[ones],end='')
30. print()
164 7. Dictionaries
Observation:
1. The code is mostly taken from num2words.py, but lines 5-9 are different.
2. Line 5 introduces a dictionary to keep track of the decimal equivalent of
Roman numerals. The Roman numerals have been given in uppercase. We
need to keep this in mind as the user might give lowercase input which cannot
be directly used as a key in this dictionary.
3. Line 7 does not use the int() function that we used in num2words.py as
the input here is not an integer but a string. Furthermore, since the dictionary
keys are in uppercase, we have used the upper() function (introduced in
section 2.3.4.2) to convert the user input to uppercase.
4. Line 8 extracts the corresponding value from the dictionary of Roman
numerals – basically converting the Roman numeral to it's decimal equivalent
using the dictionary. Note that no checks have been made for simplicity.
5. Line 9 prints the resultant decimal number and the rest of the program then
proceeds to convert it to words.
7.8 Questions
1. What is a dictionary?
2. Explain all the ways by which we can access elements of a dictionary with
examples.
3. Is there a way by means of which you can ensure that dictionary elements are
maintained in the same order in which they were added?
4. Explain all possible ways of deleting elements from a dictionary with
examples.
5. Illustrate the usage of the function setdefault() used on a dictionary when
passed with arguments in the below scenarios:
1. An element pair
2. A single element which is already present as a key
3. A single element which does not exist as a key in the dictionary
6. Illustrate with an example how to extract a key of a dictionary given its value .
7. Given 2 dictionaries D1={'One':1, 'Two':2} and D2={'Three':3},
state how can we append D1 with D2 in a single step so that resultant
dictionary has 3 element pairs.
8. Illustrate how to swap keys with values in a dictionary.
7. Dictionaries 165
7.9 Exercises
1. Prepare a dictionary that has odd numbers from 1 to 100 as keys with values
as their cubes
2. Write a program that accepts only numbers from a user and at the time of exit,
display the number of times the user had entered each number.
3. A shop has following commodities with its inventory maintained in a
dictionary:
comm = {
'chair' : 10,
'table' : 22,
'sofa-set': 2,
'tv-unit': 0,
'fan': 10,
'table-lamp': 0,
'iron-box': 0,
'bed': 30
}
and its price in another dictionary
price = {
'chair': 2365,
'table': 37752,
'sofa-set': 23299,
'tv-unit': 120344
'bed': 40226
}
Accept the number of items for each commodity sold from standard input and
at the end do the following:
1. Evaluate the income made by the shop owner.
2. If user enters a commodity that is not available in the shop, display
appropriate message.
3. If user enters a quantity that is beyond the available limit, display the size
of the inventory.
4. Display the inventory available.
166 7. Dictionaries
SUMMARY
➢ The for loop can be naturally used to iterate through the key-
value pairs of a dictionary. It is also possible to iterate through
the keys using the keys() method and iterate through the values
using the values() method.
8 STRINGS
STRINGS
A string is an immutable sequence of characters. They are one of most heavily used
data types in programming. In Python, strings are instances of the class str.
Being a sequence, it is possible to iterate through the characters of a string using a
simple for loop as shown below:
>>> type(123)
<class 'int'>
>>> type('123')
<class 'str'>
>>> str(123)
'123'
>>> type(str(123))
<class 'str'>
>>> int('123')
123
>>> type(int('123'))
<class 'int'>
str.startswith(substring[,start=0[,end=len(str)]])
Forms:
1. str.startswith(substring)
2. str.startswith(substring,start)
3. str.startswith(substring,start,end)
We can also provide a tuple of strings instead of a single string for substring:
Thus, we can check if a string starts with one of a set of substrings or not.
170 8. Strings
This form is the least useful form and is almost useless when a single substring is
provided. It can be of use when a tuple of substrings are provided.
8. Strings 171
str.endswith(substring[,start=0[,end=len(str)]])
Forms:
1. str.endswith(substring)
2. str.endswith(substring,start)
3. str.endswith(substring,start,end)
We can also provide a tuple of strings instead of a single string for substring:
Thus, we can check if a string ends with one of a set of substrings or not.
Form #2: str.endswith(substring,start)
The function returns True if the string str ends with the string substring, with the
check starting from the index start within the string str. In other words, we are
checking for the existence of the substring substring at the index start in the
string str with nothing else following it:
172 8. Strings
This form is the least useful form and is almost useless when a single substring is
provided. It can be of use when a tuple of substrings are provided.
Complete syntax:
str.find(substring[,start=0[,end=len(str)]])
Forms:
1. str.find(substring)
2. str.find(substring,start)
3. str.find(substring,start,end)
This form is useful if we want to search for a specific occurrence when there are
multiple occurrences of a substring in a string. In fact, we can write a program that
uses this form to show us all occurrences of a substring in a string:
174 8. Strings
finddemo.py
1. #!/usr/bin/python
2.
3. # Demonstration of searching for all occurrences
4. # of a substring in a string
5.
6. string = input("Enter the string to search in:")
7. substring = input("Enter the substring to search for:")
8.
9. index=-1
10. indices=[]
11. while True:
12. index = string.find(substring,index+1)
13. if index==-1: break
14. indices.append(index)
15.
16. print(string)
17. col=0
18. for i in indices:
19. print(' '*(i-col), end='')
20. print('^',end='')
21. col=i+1
22. print()
Output:
Observation:
1. We accept the main string from the user in line 6 and store it in the variable
string. We accept the substring to be searched for from the user in line 7
and store it in the variable substring.
2. We have an infinite loop in line 11 to search for all occurrences of the
substring within the main string. We find the index of the next match in line 12
and break out of the loop in line 13 if there are no more matches.
3. To ensure that each time we search we don’t end up searching the same
position in the string, we use a starting position of index+1 in line 12.
4. To further ensure that this code works correctly and we start searching from
index 0, we initialise index to -1 in line 9 so that the expression index+1 in
8. Strings 175
str.rfind(substring[,start=0[,end=len(str)]])
Forms:
1. str.rfind(substring)
2. str.rfind(substring,start)
3. str.rfind(substring,start,end)
Practically, this form is not very useful and has few applications. The third form is
more useful though.
Form #3: str.rfind(substring,start,end)
This form is similar to the previous form, except that the search ends at index end
(excluding end) instead of ending at the end of the string. If the substring substring
was not found between indices start and end-1, this function fails and returns -1.
This form is useful if we want to search for a specific occurrence when there are
multiple occurrences of a substring in a string. In fact, we can write a program that
uses this form to show us all occurrences of a substring in a string:
rfinddemo.py
1. #!/usr/bin/python
2.
3. # Demonstration of searching for all occurrences
4. # of a substring in a string in reverse order
5.
6. string = input("Enter the string to search in:")
7. substring = input("Enter the substring to search for:")
8.
9. index=len(string)-len(substring)+1
10. indices=[]
11. while True:
12. index = string.rfind(substring,0,index+len(substring)-
1)
13. if index==-1: break
14. indices.append(index)
15.
16. print(string)
17. col=0
18. for i in reversed(indices):
19. print(' '*(i-col), end='')
20. print('^',end='')
21. col=i+1
22. print()
8. Strings 177
Output:
Observation:
1. This program is similar to the program finddemo.py covered in section
8.2.2.1. The changes are highlighted in bold.
2. Since we get the indices from right to left, each time the starting index should
be 0 but the ending index should decrease based on each successful search.
3. Also, since we obtain indices from right to left, we finally reverse this order in
line 18.
str.index(substring[,start[,end]])
178 8. Strings
Forms:
1. str.index(substring)
2. str.index(substring,start)
3. str.index(substring,start,end)
str.rindex(substring[,start[,end]])
Forms:
1. str.rindex(substring)
2. str.rindex(substring,start)
3. str.rindex(substring,start,end)
str.count(substring[,start[,end]])
8. Strings 179
Forms:
1. str.count(substring)
2. str.count(substring,start)
3. str.count(substring,start,end)
>>> "abcdabadac".count("ab")
2
>>> "abcdabadac".count("ab",1)
1
>>> "abcdabadac".count("ab",0,3)
1
str.partition(separator)
The partition() function searches for the first (leftmost) occurrence of a separator
within a string and returns a tuple containing 3 parts: the part before the separator, the
separator itself and the part after the separator. In case the separator was not found in
the string, it will still return a tuple containing 3 parts: the entire string followed by 2
null strings.
In the above example, the string is partitioned at “:”, giving rise to “India” as the part
before the “:”, the “:” separator itself and “New Delhi” as the part after the “:”.
In the above example, the string is partitioned at the first “:” with “New Delhi
China:Beijing” being the part after the first “:”.
8. Strings 181
In this example, since the separator “-” was not found in the string, the entire string is
the first element of the tuple and the remainder elements are null strings.
str.rpartition(separator)
str.split([,separator[,maxsplits]])
182 8. Strings
Forms:
1. str.split()
2. str.split(separator)
3. str.split(separator,maxsplits)
>>> "125,342,264".split(",")
['125', '342', '264']
>>> "125,,342,264".split(",")
['125', '', '342', '264']
>>> "53269985".split("99")
['5326', '85']
8. Strings 183
Note:
1. The separator is a string and can be made up of multiple characters
2. If the separator occurs in consecutive succession, a null string is assumed to
be present between them and is counted as one of the split pieces
Special cases:
Since the separator is None, splitting is similar to form #1, but with a maximum of 3
splits only.
184 8. Strings
>>> "125,342,264".split(",",-1)
['125', '342', '264']
Since maxsplits is -1 and separator is not None, splitting is similar to form #2.
str.splitlines([keepends])
Forms:
1. str.splitlines()
2. str.splitlines(keepends)
str.join(iterable)
The join() function concatenates all the strings in the iterable sequence (list,
tuple, etc.) using the string str as the separator between them and returns the new
string. As a special case, if str is a null string, there won't be any separator between
the strings.
Examples:
>>> ",".join(("12","24","36"))
'12,24,36'
>>> " and ".join(("12","24","36"))
'12 and 24 and 36'
>>> "".join(("12","24","36"))
'122436'
Complete syntax:
str.lstrip([chars])
Forms:
1. str.lstrip()
2. str.lstrip(chars)
Note:
1. If the string str does not start with a whitespace, this function simply returns
the original string without any changes
>>> "233Hello466World599".lstrip("0123456789")
'Hello466World599'
str.rstrip([chars])
8. Strings 187
Forms:
1. str.rstrip()
2. str.rstrip(chars)
Note:
1. If the string str does not end with a whitespace, this function simply returns
the original string without any changes
>>> "233Hello466World599".rstrip("0123456789")
'233Hello466World'
str.strip([chars])
Forms:
1. str.strip()
2. str.strip(chars)
188 8. Strings
Note:
1. If the string str neither starts nor ends with a whitespace, this function simply
returns the original string without any changes
>>> "233Hello466World599".strip("0123456789")
'Hello466World'
str.replace(old,new[,count])
Forms:
1. str.replace(old,new)
2. str.replace(old,new,count)
8. Strings 189
>>> "Delhi-Bombay-Bangalore-Bombay".replace("Bombay","Mumbai")
'Delhi-Mumbai-Bangalore-Mumbai'
>>> "Delhi-Bombay-Bangalore-Bombay".replace("Bombay","Mumbai",1)
'Delhi-Mumbai-Bangalore-Bombay'
We will examine substitution later – first we'll concentrate on the template string.
190 8. Strings
${thousand}000
${a}_${b}
Form #3: $$
Since the '$' symbol has a special meaning as an indication of a placeholder, what if
we want a simple '$' character within the template string? The solution is to use two '$'
characters instead as shown in the examples below:
$$1000
$$x=10
Now that we know how to frame template strings, let us see the code behind that in
Python. Here is an example:
Observation:
1. We need the Template class which is present in the string module. The
first line takes care of that.
2. We create a Template object using a template string and store a reference
to that object in the variable t.
3. Using this template, we'll be able to create multiple strings by substituting
values, which we will see soon.
8. Strings 191
1. template.substitute(dict)
2. template.substitute(keyargs)
3. template.substitute(dict,keyargs)
NOTE:
The statement “from string import Template” needs to be done only once
per Python session. Repetitions of this statement are tolerated, however.
NOTE:
The order of the keys in the dictionary in form #1 and the order of the keyword
arguments in form #2 does not matter!
8. Strings 193
Observation:
1. The value of the template identifier z is taken from the keyword argument
(100), even though the dictionary contained a different value (14) since
preference is given to keyword arguments.
2. The values of the template identifiers x and y are taken from the dictionary
since no keyword arguments were provided for them.
str.maketrans(x[,y[,z]])
Note:
1. This is a static method (section 12.4.4) of str class and hence we will literally
use str to invoke it, not any string instance.
2. The meanings of the parameters x, y and z depend on how many parameters
have been given. We will explore each case separately.
Forms:
1. str.maketrans(dict)
2. str.maketrans(lookup,replacement)
3. str.maketrans(lookup,replacement,discard)
str.maketrans(dict)
The actual translation will be done by the translate() function, but the translation
mechanism is finalised here. When the translate() function performs the
translation of a string using this dictionary, it performs the following steps for each
character found in the string to be translated:
1. If the character's ordinal is found in the keys of the dictionary and it's
corresponding value is not None, the value is substituted instead.
2. In the above case, if the value found is None, the original character is silently
deleted in the translated string.
3. If the character's ordinal was not found in the keys of the dictionary, the
character is retained as it is without any changes in the translated string.
Here is an example of building a translation table that translates octal numbers to
lowercase alphabets:
>>>
d={'1':'a','2':'b','3':'c','4':'d','5':'e','6':'f','7':'g','8':None,'9
':None}
>>> t=str.maketrans(d)
>>> "hello 0468 world".translate(t)
'hello 0df world'
Observation:
1. The keys of the dictionary are characters from 0 to 9 (in quotes), not integers!
Note that the keys can be either ordinals (integers) or characters. What we
require here are the characters 0 to 9 which have ordinals 48-57.
2. The Octal system only supports digits from 0 to 7. Hence, we decide to
eliminate 8 and 9 using the value None for them.
3. We decide to retain the character 0 as it is and hence it is not present in the
dictionary.
4. Thus, 0 is retained, 1-7 are replaced by a-g and 8-9 are deleted. All other
characters are retained.
196 8. Strings
str.maketrans(lookup,replacement)
In the second form, given 2 strings of equal length, each character of the string
lookup will be mapped on to the corresponding character in the string replacement
during translation. Characters in the string to be translated that are not present in
lookup will be retained during translation without any changes.
Here is an example that maps lowercase vowels to uppercase:
>>> t=str.maketrans("aeiou","AEIOU")
>>> "This is a demonstration".translate(t)
'ThIs Is A dEmOnstrAtIOn'
str.maketrans(lookup,replacement,discard)
This third form is similar to the second form discussed in the previous section, except
for the third parameter: a string of characters to be discarded during translation.
While the second form retains all characters not found in lookup during translation, the
third form discards all characters present in the string discard and retains only those
characters without changes that are neither present in lookup nor in discard.
Here is the same example we used in the previous section for converting lowercase
vowels to uppercase, but with the added feature of discarding spaces:
str.ljust(width[,fill])
Forms:
1. str.ljust(width)
2. str.ljust(width, fill)
>>> s="Hello"
>>> s.ljust(20)
'Hello '
>>> s.ljust(2)
'Hello'
>>> s.ljust(20,"=")
'Hello==============='
>>> s.ljust(2,"=")
'Hello'
str.rjust(width[,fill])
Forms:
1. str.rjust(width)
2. str.rjust(width, fill)
>>> s.rjust(20)
' Hello'
>>> s.rjust(2)
'Hello'
>>> s.rjust(20,"=")
'===============Hello'
>>> s.rjust(2,"=")
'Hello'
str.center(width[,fill])
8. Strings 199
Forms:
1. str.center(width)
2. str.center(width, fill)
>>> s.center(20)
' Hello '
>>> s.center(2)
'Hello'
>>> s.center(20,"=")
'=======Hello========'
>>> s.center(2,"=")
'Hello'
Syntax:
str.zfill(width)
200 8. Strings
Examples:
>>> "123".zfill(5)
'00123'
>>> "+123".zfill(5)
'+0123'
str.expandtabs(tabsize=8)
Forms:
1. str.expandtabs()
2. str.expandtabs(tabsize)
In the first form, the tabsize is assumed to be 8 whereas in the second form, we can
provide the tabsize.
The tabsize is used to divide the string into zones, starting from column 0. Thus, with
a tabsize of 8, the zones will start from column 0, 8, 16, 24, etc. With a tabsize of 2,
the zones will start from 0, 2, 4, 6, etc. A tab character is supposed to move the cursor
to the beginning of the next zone. This effect is simulated by this function by adding
spaces instead of the tabs.
Examples:
>>> s="a\tb\t\tc"
>>> s.expandtabs()
'a b c'
0123456789012345678901234 : Reference Ruler
>>> s.expandtabs(2)
'a b c'
0123456 : Reference Ruler
NOTE:
The “Reference Ruler” in the output above has been added for readability in this
book and is not printed by the Python interpreter!
8. Strings 201
{argumentNumber:argumentFormat}
{argumentName:argumentFormat}
Both the parts are optional. If argumentNumber and argumentName are not given, it
will extract an argument in order. If the argumentFormat is not given, it will not apply
any special formatting – it will display the value as it is using as many columns as
necessary to display that value. If argumentFormat is omitted, the separating colon
is not required.
The complete syntax of the argumentFormat is shown below:
[[fill]align][sign][#][0][width][grouping][.precision][type]
202 8. Strings
Different parts of this syntax will be dealt with in detail in different sections as
documented in the below table:
Let us start with the format type that documents the type of the field.
[[fill]align][sign][#][0][width][grouping][.precision][type]
Type Meaning
b Binary Format
d Decimal Format
o Octal Format
Type Meaning
n Number formatted using current locale
NOTE:
If no format is selected for an integer, it will be displayed in decimal (the d format)
Examples:
Type Meaning
f Fixed-Point Format
e Exponentiation Format
% Percentage Format
NOTE:
If no format is selected for a real number, it will be displayed in general format (the
g format), but guarantees at least 1 digit after the decimal point unlike the g format.
Examples:
NOTE:
Format types cannot be used for type conversion! For instance, the d format type
cannot be used to convert a real number to an integer. Any attempt to perform such
conversions will result in an error.
[[fill]align][sign][#][0][width][grouping][.precision][type]
NOTE:
If the given width is insufficient, it will use up as many columns as needed ignoring
the field width request! Thus, the field width can be considered to be the minimum
field width!
Examples:
When a width is given, and the width exceeds the default width requirement of a field,
spaces are used by default to fill in the rest of the columns. In such cases, the
alignment controls where the spaces get added. The possible values for this and their
interpretation are shown in the table below:
Alignment Meaning
< Left-align field (add spaces to the right of the content)
= Right-align numeric field (add spaces on the left, but after any sign)
Examples:
Note:
1. The = alignment is helpful to produce a list of numbers, all aligned with their
signs in one column.
2. When no alignment is specified, strings are left-aligned whereas numbers are
right-aligned by default.
8. Strings 207
While the alignment uses spaces by default to fill the unused columns, any other
character can be used to fill in the unused columns. We will revisit the same examples
given in the previous section, using “*” instead of the default spaces:
The exact effect of precision depends on what we are displaying and which format
type we are using. We will therefore see individual cases:
>>> '{:.2f}'.format(12.3456789)
'12.35'
>>> '{:.12f}'.format(12.3456789)
'12.345678900000'
Note:
1. If the precision is less than the number of fractional digits, the fractional digits
are rounded.
2. If the precision is greater than the number of fractional digits, ‘0’ is added.
208 8. Strings
>>> '{:.5g}'.format(12.3456789)
'12.346'
>>> '{:.10g}'.format(12.3456789)
'12.3456789'
>>> '{:.15g}'.format(12.3456789)
'12.3456789'
>>> '{:5.2s}'.format('abcdefghijkl')
'ab '
>>> '{:5.4s}'.format('abcdefghijkl')
'abcd '
NOTE:
In the above examples, the width is 5 implying that 5 columns are reserved for the
field. The alignment is left by default and spaces are used by default for filling the
field. The precision controls how many characters from the string are available.
[[fill]align][sign][#][0][width][grouping][.precision][type]
The presence of a 0 before the width indicates 0-padding instead of space padding:
>>> '{:5d}'.format(12)
' 12'
>>> '{:05d}'.format(12)
'00012'
>>> '{:05d}'.format(-12)
'-0012'
8. Strings 209
[[fill]align][sign][#][0][width][grouping][.precision][type]
The sign controls how positive and negative numbers are differentiated and handled.
The possibilities are listed in the table below:
Sign Meaning
- Negative numbers are prefixed with '-' and positive numbers have
no prefix
+ Negative numbers are prefixed with '-' and positive numbers are
prefixed with '+'
space (' ') Negative numbers are prefixed with '-' and positive numbers are
prefixed with a space (' ')
The default behaviour (when no sign is used) is that of the '-' sign in the table.
Examples:
>>> '{:d}'.format(12)
'12'
>>> '{:d}'.format(-12)
'-12'
>>> '{:-d}'.format(12)
'12'
>>> '{:-d}'.format(-12)
'-12'
>>> '{:+d}'.format(12)
'+12'
>>> '{:+d}'.format(-12)
'-12'
>>> '{: d}'.format(12)
' 12'
>>> '{: d}'.format(-12)
'-12'
210 8. Strings
[[fill]align][sign][#][0][width][grouping][.precision][type]
The grouping decides what separator will be used between groups of digits. If no
grouping is specified, by default no separator is used. If a comma ( ,) is used as
grouping, then groups of digits will be separated by commas as shown below:
>>> '{:,d}'.format(123456789)
'123,456,789'
[[fill]align][sign][#][0][width][grouping][.precision][type]
The presence of the '#' in the format specification indicates a request for an alternate
representation – a representation different from the default one. This alternate
representation exists only for numbers and differs from type to type.
For instance:
1. The alternate representation for binary, octal and hexadecimal types is the
presence of '0b', '0o' and '0x' respectively.
2. The alternate representation for the 'f' and 'F' types is to display the decimal
point even when there is no fractional part.
3. The alternate representation for the 'g' and 'G' types is to retain trailing zeros.
Examples:
>>> '{:b}'.format(12)
'1100'
>>> '{:#b}'.format(12)
'0b1100'
>>> '{:o}'.format(12)
'14'
>>> '{:#o}'.format(12)
'0o14'
>>> '{:x}'.format(12)
'c'
>>> '{:#x}'.format(12)
'0xc'
>>> '{:X}'.format(12)
'C'
>>> '{:#X}'.format(12)
'0XC'
8. Strings 211
>>> '{:.0f}'.format(12)
'12'
>>> '{:#.0f}'.format(12)
'12.'
>>> '{:g}'.format(1200000000)
'1.2e+09'
>>> '{:#g}'.format(1200000000)
'1.20000e+09'
>>> '{:G}'.format(1200000000)
'1.2E+09'
>>> '{:#G}'.format(1200000000)
'1.20000E+09'
8.8 Questions
1. Explain any 5 operations that can be done on strings in Python with
examples.
2. How is the index() function different from find() function in Python?
3. Explain the lstrip(), rstrip() and strip() methods of str class.
4. Explain template processing using the Template class in Python.
5. Explain string translation in Python.
6. Explain the string functions used for string justification in Python.
7. Write a short note on the format() method of str class.
8.9 Exercises
1. Write a Python script to calculate the number of occurrences of a given
character in an any given string.
2. Write a Python script to count the number of occurrences of each word in a
given string.
3. Write a Python script to form a string out of the first 2 and last 2 characters
from the given string. For example, given the string “I am a student”, the
output expected is “I nt”.
212 8. Strings
SUMMARY
9 REGULAR EXPRESSIONS
Deal with groups within search patterns and extract grouped parts.
REGULAR EXPRESSIONS
9.1 Introduction to Regular Expressions
A regular expression is a pattern matching expression – it is an expression made up
of operators and operands that help specify what kind of pattern should be searched
for and matched within a string. Applications of regular expressions include:
1. Checking whether a string conforms to a particular pattern or not (e.g.
whether a string is a valid email id, valid integer, valid name, valid address,
etc.)
2. Checking whether a string contains a part that matches a particular patterns
(e.g. checking if a string contains coordinates, website URLs or email
addresses)
3. Counting the number of occurrences of substrings that match a particular
pattern (e.g. counting the number of operators present in an expression)
4. Locating the substrings that match a particular pattern within a string (e.g.
locating keywords or identifiers within a computer program statement)
5. Substituting substrings that match a particular pattern with a replacement
string (e.g. replacing all email addresses with the string 'EMAIL')
6. Substituting substrings that match a particular pattern with a replacement
string that possibly reuses the content found in order to reorganise the string
(e.g. converting dates in mm-dd-yy format to dd/mm/yy format)
What makes a regular expression based search different from the string search
functions that we saw in section 8.2 is that in the case of regular expressions, we are
not necessarily searching for a particular string, but are instead possibly searching for
multiple strings that all follow a common pattern. If we are searching for
“[email protected]” in a string, we can use normal string functions. But if we are
searching for all email addresses related to cyberplusit.com, or possibly any and
every email address without knowing what email addresses to expect, we need
regular expressions.
A regular expression is made up of operands and operators. Operands are merely
characters that have to be found while operators are used to enforce rules pertaining
to the occurrences of operands.
Before we start framing regular expressions for matching, keep these points in mind:
1. What matters currently is whether there is a match or not (Boolean).
2. It does not matter where the match is.
3. It does not matter what the match is.
4. It does not matter how many matches are there.
9. Regular Expressions 215
Of course, eventually we will learn to handle points 2-4 above. We will later learn how
to:
1. Find out where the matches are present in the given string.
2. Find out what parts of the string matched.
3. Find out how many matches were there.
A good starting point is framing regular expressions that don't contain operators. Here
is our first example of a regular expression (without any operators) – not:
cannot pot
nothing nod
footnote night
As can be seen from these examples, the ones that are matched contain the
characters n, o and t, exclusively in that order, with nothing in between them. What
does not matter however, is where it was found within the string and how many times
it was found.
We will now explore some basic operators that can be used in regular expressions.
NOTE:
We will use the abbreviation RE to denote Regular Expression from now on for
brevity.
nut night
native spent
internet nest
against sat
knotted nun
nut against
net nest
knit constant
native Net
9. Regular Expressions 217
net nut
punctual against
9.2.1.4 Matching a Character Within a Negated Set Using the [^] Operator
Finally, the negated character set indicated by [^] implies a placeholder for a single
character with the constraint that the character can be any one apart from the ones
listed within [].
Thus, while [0-9] represents a digit, [^0-9] represents a non-digit.
This can be combined with our understanding of sections 9.2.1.2 and 9.2.1.3 to create
a regular expression like [^a-emx-z] which will match any single character that is
either not a lowercase alphabet, or if it is, is only one of f, g, h, i, j, k, l, n, o, p, q,
r, s, t, u, v, w.
Thus, the RE n[^aeiou]t matches any string that contains n followed by a character
that is not a lowercase vowel, followed by t, as shown below:
punctual nt
218 9. Regular Expressions
Table 20: Escape Sequences for Standard Regular Expression Character Classes
Thus, the RE \w\D\d will match any string that contains a word-forming character
followed by a non-digit followed by a digit, as shown below:
9e9 abc
cc3 c33
Thus, the RE go*d matches any string that contains g followed by 0 or more o
followed by d as shown below:
god go
good gold
gooooood old
good gd
gooooood go
good gooooood
good gooooood
The form {m,} implies minimum m occurrences of the preceding symbol without any
upper limit.
The *, + and ? operators can be considered to be special shortforms of the {}
operator as shown below:
+ {1,}
? {0,1}
As a special case, the form {m} represents exactly m occurrences of the preceding
symbol.
Thus, the RE go{2}d is exactly identical to the RE good – both can match strings
that contain g followed by exactly 2 occurrences of o, followed by d.
9. Regular Expressions 221
Operator Meaning
{m,n} At least m occurrences and at most n occurrences
ld odd
going swallowed
builder gd
pango ol
222 9. Regular Expressions
When combined with the () operator, the choice is restricted within the parentheses.
Thus, the RE g(ol|ran)d can match any string that contains g followed by either ol
or ran followed by d, restricting the choice within the parentheses, as shown below:
grand golrand
nothing net
NOTE:
The escape sequence \A at the beginning of a RE plays the same role as the ^
operator. Thus, the RE ^not is identical to \Anot.
cannot net
NOTE:
The escape sequence \Z at the end of a RE plays the same role as the $ operator.
Thus, the RE not$ is identical to not\Z.
The \b escape sequence matches a null string at a word boundary. It's purpose is to
ensure a match at the right place. Thus, the RE \bnot will match those strings that
contain not that starts at a word boundary, as shown below:
The RE not\b will match those strings that contain not that ends at a word
boundary, as shown below:
The RE \bnot\b will match those strings that contain not that starts at and ends at a
word boundary – or in other words, strings that contain the word not, as shown
below:
The RE not\B will match those strings that contain not that does not end at a word
boundary, as shown below:
9. Regular Expressions 225
The RE \Bnot\B will match those strings that contain not that neither starts nor
ends at a word boundary, as shown below:
process escape sequences within strings. The regular expression escape sequences
are not known to the interpreter as such and could result in a problem if processed
before the regular expression engine sees the RE string. The solution is to escape
every backslash by prefixing each with a backslash. The table below shows some
regular expressions and how they can be represented as strings in Python:
RE String Representation
\d "\\d"
\d\d-\d\d-\d\d "\\d\\d-\\d\\d-\\d\\d"
\\d "\\\\d"
Observation:
1. Basically, each '\' in the RE has to be represented as '\\' in a string to ensure
that it remains as a backslash within the string.
2. The last example shows a RE that matches a backslash (\) followed by a d.
As usual, each backslash has to be prefixed by a backslash and hence we
have 4 backslashes within the string.
>>> "Hello\n"
'Hello\n'
>>> r"Hello\n"
'Hello\\n'
>>> R"Hello\n"
'Hello\\n'
9. Regular Expressions 227
Observation:
1. Observe that the escape sequence '\n' is not recognised when the r or R
prefix is used. Thus, '\n' becomes 2 characters: '\' and 'n'. The '\' character
is represented in Python as '\\'
NOTE:
We will use raw strings for all regular expressions henceforth as a professional
practice.
^[A-Za-z]+( [A-Za-z]+)*$
Observation:
1. The first part [A-Za-z]+ is to ensure that the name starts with alphabets
(and not a space)
2. The second part ( [A-Za-z ]+)* allows for 0 or more occurrences of
words (sequences of alphabets preceded by a space)
Here is a more complex RE that handles all the above 3 enhancements as well:
^[A-Za-z]+([ .\-\'][A-Za-z]+)*$
Observation:
1. The first part [A-Za-z]+ ensures that the name starts with alphabets
2. The part [ .\-\'] is a placeholder for a word separator, which could be a
space, a period, a hyphen or an apostrophe. The hyphen (-) has to be
escaped as otherwise it will be wrongly construed to mean a range within the
character set. The apostrophe (') has to be escaped if the RE is enclosed
within single quotes, else it will mean the end of the string. The period (.)
should not be escaped though it is a RE operator as it is present within [].
3. The following part [A-Za-z]+ requires alphabets to be present after every
such separator and prevents a name from ending with a stray separator.
4. Finally, the separator and the part following that is optional and can be
present multiple times – thus the grouping and the *.
^[A-Za-z_]\w{2,15}$
Observation:
1. The first part [A-Za-z_] ensures that the username does not start with a
digit.
2. The second part \w{2,15} permits any word forming character (alphabets,
digits and underscores) with the constraint of a length between 2 and 15
characters. Together with the first character matched earlier, it ensures a
length of 3-16 characters.
9. Regular Expressions 229
^[A-Za-z0-9!@#$%^&*()\-+_=]{8,}$
Observation:
1. The hyphen (-) within [] has to escaped to prevent it from being treated as
an operator.
^(\d{2}-\d{2}-\d{2}(\d{2})?)|(\d{2}/\d{2}/\d{2}(\d{2})?)$
Observation:
1. We have 2 patterns – one that supports hyphens (-) as the separator and one
that supports slashes (/) as the separator.
2. We provide a choice between the 2 patterns using the | operator. We do not
prefer using [\-/] instead as the separator as this allows mixing the 2
separators within a single date!
3. Each digit is represented by \d and {2} ensures exactly 2 occurrences (we
could have also written it as \d\d instead of \d{2}).
4. A 4-digit year is supported using an optional 2 digits after compulsory 2 digits:
\d{2}(\d{2})?
230 9. Regular Expressions
^(\+\d{1,4})?(\d[\- ]?){8,10}$
Observation:
1. The first part (\+\d{1,4})?, is for the country code. The ? at the end
makes it optional. The + is escaped to prevent it from treating it as the +
operator of regular expressions. \d{1,4} allows 1-4 digits following the +.
2. The second part (\d[\- ]?){8,10} allows for 8-10 of additional digits, also
permitting an optional hyphen (-) or space after each digit. The hyphen is
escaped since it has a special meaning within the character class.
[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}
Observation:
1. The first part, [a-z0-9._%+-]+, permits 1 or more occurrences of the
specified characters. This part matches the username. Of course, the
characters can be tweaked as necessary.
2. The second part, [a-z0-9.]-+, permits 1 or more occurrences of the
specified characters. We do not expect % and + to be part of domainname,
unlike the username. This part matches the domainname. This part also
takes care of subdomains like “mail.abc” or non-TLD’s like the “.co” in
“.co.in”.
3. The third part, [a-z]{2,}, permits 2 or more characters to be part of the
9. Regular Expressions 231
^(\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5]\.){3}(\d{1,2}|[01]\d{2}|2[0-4]\d|
25[0-5])$
Observation:
1. Each integer should be in the range of 0 to 255. This includes all single-digit
numbers and all double-digit numbers. This justifies the RE \d{1,2}.
2. A 3-digit number that starts with 0 or 1 can be followed by any 2 digits. This
justifies the RE [01]\d{2}.
3. A 3-digit number that starts with 2 should be confined to 255. If the second
digit is in the range of 0 to 4, the third digit can be anything. This justifies the
RE 2[0-4]\d.
4. Finally, a 3-digit number that starts with 25 can only have the third digit in the
range of 0 to 5. This justifies the RE 25[0-5].
5. Since each integer can by any of the above, we use the RE \d{1,2}|
[01]\d{2}|2[0-4]\d|25[0-5]
6. The first 3 integers of the IP address will be followed by a dot (.), which needs
to be escaped to prevent it from being treated as an operator. This justifies
(\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5]){3}.
7. The last integers has the same pattern as the previous 3, except for the dot.
This justifies the final part (\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5]).
232 9. Regular Expressions
^\d+$
^[+-]?\d+$
Observation:
1. The sign can be + or -. The “-” is not escaped since it is the last character in
the character set. When “-” is the first or last character in the character set,
escaping is not required. In all other cases, escaping would be required. It
might be a good idea for beginners to always escape “-” within character
classes to be on the safer side. With experience, you could perhaps elect to
drop escaping when not required in order to increase readability.
2. The sign is made optional using the ? Operator.
If we wish to match a certain number of digits, like let’s say a 3-digit integer, the RE
could be:
^\d{3}$
Observation:
1. This RE will match an integer in the range of 000-999. It will still not match
integers like 25.
If we wish to accept any integer upto 3 digits wide, for example, the RE can be:
^(\d?){2}\d$
9. Regular Expressions 233
Observation:
1. The \d? means an optional digit. The {2} means 2 such optional digits. The
\d after that means a compulsory digit.
2. The RE therefore means: 2 optional digits followed by a compulsory digit.
That compulsory digit matches the units place while the optional digits can
match the hundreds place and tens place.
If we wish to accept an integer within arbitrary intervals, the RE becomes more
complex. For example, here is a RE for accepting a 3-digit integer between 000 and
255:
^([01]\d\d|2[0-4]\d|25[0-5])$
Observation:
1. The RE is made up of 3 choices, covered below.
2. The first choice is [01]\d\d which matches all 3-digit integers that start with
0 or 1, matching 000 to 199.
3. The second choice is 2[0-4]\d which matches integers from 200 to 249.
4. The third choice is 25[0-5] which matches integers from 250 to 255.
^([01]?\d?\d|2[0-4]\d|25[0-5])$
Observation:
1. The RE is made up of 3 choices, covered below.
2. The first choice is [01]?\d?\d, which covers upto 2 optional digits followed
by a compulsory digit, with additional constraints on the first optional digit that
it should be 0 or 1. This takes care of all single digit integers, all 2-digit
integers and all 3-digit integers starting with 0 or 1, hence matching 0-199 as
well as 000-199.
3. The second choice is 2[0-4]\d, which matches all 3-digit integers starting
with 2, and hence matches the range 200-249.
4. The third choice is 25[0-5], which matches all 3-digit integers in the range of
250-255.
234 9. Regular Expressions
^[-+]?\d*\.?\d+([eE][-+]?\d+)?$
Observation:
1. The [-+]? takes care of an optional leading sign.
2. The \d*\.?\d+ takes care of optional whole number part, an optional
decimal point and a fractional part. It also takes care of situations where there
is no fractional part – the \d+ takes care of the integral part instead.
3. The ([eE][-+]?\d+)? takes care of the exponent notation (e or E),
followed by an optional sign, followed by an integral exponent part. This entire
part is optional.
\([-+]?\d+,[-+]?\d+\)
Observation:
1. The x- and y-coordinates are each matched using the same RE pattern. The
“(”, “,” and “)” are retained as characters. Since parentheses are operators,
they need to be escaped. Since “,” is not an operator, it will not be escaped.
2. The pattern [-+]\d+ supports an optional sign followed by an integer. This
RE matches only integers, but we could also use the pattern we had used to
match real numbers instead.
import re
9. Regular Expressions 235
The first 3 functions we will examine are match(), fullmatch() and search(), all
of which can search for a regular expression match within a string, but differ slightly in
functionality as summarised in the table below:
Thus, if the RE not is used to search within different strings, the table below
summarises which functions will detect a match:
note
cannot
Note:
1. While fullmatch() is the most restrictive function, requiring a complete
match, search() is the least restrictive function tolerating a match anywhere
within the string.
2. The presence of a leading ^ or \A within the RE will make even the
search() function behave like the match() function. The absence of these
will not make any difference to the match() function.
3. The presence of a leading ^ or \A and a trailing $ or \Z within the RE will
make the search() and match() functions behave like the fullmatch()
function. The absence of these will not make any difference to the
fullmatch() function.
4. Anything matched by fullmatch() will definitely be matched by the
match() and search() functions too.
5. Anything matched by match() will definitely be matched by the search()
function too.
236 9. Regular Expressions
re.search(pattern,string[,flags])
Forms:
1. re.search(pattern,string)
2. re.search(pattern,string,flags)
>>> import re
>>> re.search(r"\d","Here are 5 apples")
<_sre.SRE_Match object; span=(9, 10), match='5'>
>>> re.search(r"\d","Here are five apples")
>>>
NOTE:
The reason we are using a raw string (r"") for the regular expression has been
covered in section 10.3.3.
We can use this function as though it returns a Boolean value as shown in the
example below:
>>> import re
>>> if re.search(r"\d","5"): print("Yes")
... else: print("No")
...
Yes
9. Regular Expressions 237
re.fullmatch(pattern,string[,flags])
Forms:
1. re.fullmatch (pattern,string)
2. re.fullmatch (pattern,string,flags)
>>> import re
>>> re.fullmatch(r"\d+","1234")
<_sre.SRE_Match object; span=(0, 4), match='1234'>
>>> re.fullmatch(r"\d+","12a34")
>>>
We can use this function as though it returns a Boolean value as shown in the
example below:
>>> import re
>>> if re.fullmatch(r"\d+","1234"): print("Yes")
... else: print("No")
...
Yes
Form #2: re.fullmatch(pattern,string,flags)
This form is identical in functionality to the previous form, but additionally allows us to
provide flags that modify the search behaviour slightly. These flags are discussed in
detail in section 10.4.4.
238 9. Regular Expressions
re.match(pattern,string[,flags])
Forms:
1. re.match (pattern,string)
2. re.match (pattern,string,flags)
>>> import re
>>> re.match(r"\d+","1234a")
<_sre.SRE_Match object; span=(0, 4), match='1234'>
>>> re.match(r"\d+","a1234")
>>>
We can use this function as though it returns a Boolean value as shown in the
example below:
>>> import re
>>> if re.match(r"\d+","1234a"): print("Yes")
... else: print("No")
...
Yes
absence of any of these flags. Multiple flags can be combined by adding them using
the + operator or by using the bitwise OR operator (|).
re.I / re.IGNORECASE
Regular expression matches are case-sensitive by default. This flag forces a case-
insensitive match as demonstrated in the examples below:
>>> import re
>>> re.search("apple", "Here are some APPLES")
>>> re.search("apple", "Here are some APPLES", re.I)
<_sre.SRE_Match object; span=(14, 19), match='APPLE'>
Observation:
1. In the first example above, the RE “apple” was unable to match the substring
“APPLE” due to the search being case-sensitive
2. In the second example, the same RE was able to match the same string due
to case-insensitive comparison, specified by re.I
re.M / re.MULTILINE
It is typically expected when dealing with regular expressions that the string is a single
line of text. The “^” and “$” operators therefore imply a match at the beginning of the
string (and the line) and at the end of the string (and the line) respectively. If the string
is a multi-line string that contains newline characters, these operators will be happily
unaware of it. To force the “^” and “$” operators to anchor at the beginning and end of
each line within the string (identified by the presence of the newline character), this
flag can be specified.
>>> import re
>>> re.search("^This", "That is good.\nThis is bad.")
>>> re.search("^This", "That is good.\nThis is bad.", re.M)
<_sre.SRE_Match object; span=(14, 18), match='This'>
Observation:
1. In the first example, the RE “^This” is unable to match the string since the
string does not start with “This”.
2. In the second example, the same RE is able to match the same string since
the string contains a line that starts with “This”.
re.S / re.DOTALL
Section 10.2.1.1 introduced the fact that the “.” operator can match any one single
240 9. Regular Expressions
character. Since we assume that a string is a single line string, the “.” character does
not match the newline character. When a multi-line string is provided, the “ .” operator
can match any character in that string except the newline character. To make the “.”
operator match even the newline character, this flag can be used.
>>> import re
>>> re.search("e.T","One\nTwo")
>>> re.search("e.T","One\nTwo", re.S)
<_sre.SRE_Match object; span=(2, 5), match='e\nT'>
Observation:
1. In the first example, the “.” does not match the newline character between “e”
and “T”.
2. In the second example, the same RE is able to match the same string as the
“.” can now match even the newline character.
There are a few more flags that can be used, but the flags discussed here are the
most commonly used ones. The table below summarises all the flags:
Flag Meaning
re.A Ensure that the shorthand operators \w, \W, \d, \D, \s, \S,
re.ASCII \b and \B apply only to ASCII characters in a Unicode string
re.L Ensure that \w, \W, \s, \S, \b and \B work based on the
re.LOCALE current locale
re.M Ensure that “^” and “$” operators work at line boundaries in a
re.MULTILINE multi-line string
>>> import re
>>> re.search("E.T","One\nTwo")
>>> re.search("E.T","One\nTwo",re.S+re.I)
<_sre.SRE_Match object; span=(2, 5), match='e\nT'>
>>> re.search("E.T","One\nTwo",re.IGNORECASE|re.DOTALL)
<_sre.SRE_Match object; span=(2, 5), match='e\nT'>
Observation:
1. In the first example, there was no match as the “.” does not match the
newline character and the comparison is case-sensitive.
2. In the second example, there is a match since the flags re.S and re.I take
care of the 2 reasons why there wasn’t a match earlier. The 2 flags were
combined using the + operator.
3. The third example is identical to the second example, with us using the full
name of the flag and combining them using the | operator.
\d\d-\d\d-\d{4}
For now, this RE can be used to search within strings for substrings that match this
pattern. But using the concept of groups, it is also possible for us to extract the date,
the month and the year in addition to playing the above role. The RE needs to be
rewritten as follows to support group content extraction:
(\d\d)-(\d\d)-(\d{4})
\n
in the above syntax, n is the group number. Thus, \1 refers to the content matched in
the first group, \2 refers to the content matched in the second group, and so on.
Given our sample string, \1 will represent 25, \2 will represent 12 and \3 will
represent 2016.
are rules that help restore sanity! Consider the following RE:
(((\d)(\d))-((\d)(\d))-(\d{4}))
The only thing we need here is a rule for numbering, and that rule is as follows: the
number of the group is assigned based on the number of it’s open parentheses within
the RE. Thus, the numbering is as follows:
(((\d)(\d))-((\d)(\d))-(\d{4}))
^^^ ^ ^^ ^ ^
123 4 56 7 8
For the sample string “25-12-2016”, the content of each back-reference (and hence,
group) will be as follows:
Back-reference Content
\1 25-12-2016
\2 25
\3 2
\4 5
\5 12
\6 1
\7 2
\8 2016
244 9. Regular Expressions
match.string
match.re
Example:
>>> import re
>>> m=re.fullmatch(r"(\d\d?)/(\d\d?)/(\d\d){1,2}", "25/12/2016")
>>> m.string
'25/12/2016'
>>> m.re
re.compile('(\\d\\d?)/(\\d\\d?)/(\\d\\d){1,2}')
Observation:
1. This example uses a RE to match a UK-format date. This time, we are storing
the return value from fullmatch (which will be a match object) in a variable
m.
2. m.string gives us the original string: '25/12/2016'
3. m.re gives us the original RE: re.compile('(\\d\\d?)/(\\d\\d?)/
(\\d\\d){1,2}')
Of course, there is no significant advantage of extracted the string and RE from a
9. Regular Expressions 245
match object, but this is just a start. We would be more interested in extracting the
contents of the groups that were found, and this is discussed in the next section,
match.groups(default=None)
The match.groups() method returns the contents of all the groups in the form of a
tuple. If any group did not participate, it’s content will be the default value provided
(None by default).
Example:
>>> import re
>>> m=re.fullmatch(r"(\d\d?)/(\d\d?)/(\d\d){1,2}", "25/12/2016")
>>> m.groups()
('25', '12', '16')
Observation:
1. Each parenthesized group content is an item within the tuple.
2. The order of the items within the tuple matches the order of the groups in the
RE.
>>> import re
>>> m=re.fullmatch(r"(\d\d)/(\d\d)?/(\d\d\d\d)", "25//2016")
>>> m.groups()
('25', None, '2016')
>>> m.groups("-")
('25', '-', '2016')
Observation:
1. In this RE, the second group is made optional and the string does not provide
any value for the same.
2. When m.groups() is used, the content for the second group is considered
to be None.
3. When m.groups("-") is used, the content “-” is understood to be the
246 9. Regular Expressions
>>> import re
>>> m=re.fullmatch(r"(\d\d)/(\d\d)?/(\d\d\d\d)", "25//2016")
>>> m.lastindex
3
Observation:
1. We have 3 sets of parentheses and therefore 3 groups.
2. Empty groups (groups without content) are also counted.
Individual group contents can be extracted using the match.group() method, the
syntax being as follows:
match.group(groupNum=0,...)
The match.group() method returns the content of the specified group. If the group
is not specified, it is assumed to be 0. An argument of 0 (either implicitly or explicitly)
is considered to mean the entire match that took place. If multiple arguments are
provided, the return value is a tuple with as many values as the arguments passed,
with corresponding content.
Examples:
>>> import re
>>> m=re.fullmatch(r"(\d\d)/(\d\d)?/(\d\d\d\d)", "25/12/2016")
>>> m.group(1)
'25'
>>> m.group(2)
'12'
>>>
>>> m.group(3)
'2016'
>>> m.group(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: no such group
>>> m.group(0)
'25/12/2016'
>>> m.group()
'25/12/2016'
>>> m.group(1,3)
('25', '2016')
9. Regular Expressions 247
>>> m.group(3,1)
('2016', '25')
Observation:
1. m.group(1), m.group(2) and m.group(3) give us the content of groups
1, 2 and 3 respectively.
2. m.group(4) raises an IndexError as the group is not found.
3. m.group(0) and m.group() are identical and give us the entire match,
which in this case is the entire string.
4. m.group(1,3) gives us the contents of groups 1 and 3 respectively as
elements within a tuple.
5. m.group(3,1) gives us the contents of groups 3 and 1 respectively as
elements within a tuple. Note that this is different from m.groups(1,3) – the
order of the groups can be specified.
match.start(group=0)
match.end(group=0)
match.span(group=0)
The match.start() method returns the start index of the specified group in the
string.
The match.end() method returns the end index of the specified group in the string.
The match.span() method returns a tuple containing the start and end indices of
the specified group in the string.
In all the 3 methods, if the group is not specified, it is implied to be 0. A group
argument of 0 means the entire match.
248 9. Regular Expressions
Examples:
>>> import re
>>> m=re.fullmatch(r"(\d\d)/(\d\d)/(\d\d\d\d)", "25/12/2016")
>>> m.groups()
('25', '12', '2016')
>>> m.start(1)
0
>>> m.end(1)
2
>>> m.span(1)
(0, 2)
>>> m.start(2)
3
>>> m.end(2)
5
>>> m.span(2)
(3, 5)
>>> m.start(3)
6
>>> m.end(3)
10
>>> m.span(3)
(6, 10)
>>> m.start()
0
>>> m.end()
10
>>> m.span()
(0, 10)
>>> m.start(0)
0
>>> m.end(0)
10
>>> m.span(0)
(0, 10)
Observation:
1. As can be verified, the first group lies between indices 0 and 2. The second
group lies between indices 3 and 5. The third group lies between indices 6
and 10.
2. The entire string is the match and therefore the entire match is between the
indices 0 and 10.
9. Regular Expressions 249
match.expand(template)
Examples:
>>> import re
>>> m=re.fullmatch(r"(\d\d)/(\d\d)/(\d\d\d\d)", "25/12/2016")
>>> m.expand(r"\2-\1-\3")
'12-25-2016'
>>> m.expand(r"US format: \2-\1-\3")
'US format: 12-25-2016'
Observation:
1. The statement m.expand(r"\2-\1-\3") is an attempt to convert a UK-
format date to a US-format date. The RE was used to process a UK-format
date (dd/mm/yyyy), which we now wish to convert to a US-format date (mm-
dd-yyyy).
2. \1, \2 and \3 represent the contents of groups 1, 2 and 3 respectively.
3. Any other characters apart from group content substitution requests are
retained verbatim.
Behaviour #1: If the RE does not contain any groups, this function returns a list of all
contents in the string string that matched the pattern pattern:
>>> import re
>>> re.findall(r"\d+", "4 four 25 twenty five 12 twelve")
['4', '25', '12']
Observation:
1. The RE will match 1 or more occurrences of digits.
2. There are 3 such matches within the given string.
3. The function returns a list of the 3 matches found.
Behaviour #2: If the RE contains 1 group, this function returns a list of all contents in
the string string that matched the group in the RE pattern:
>>> import re
>>> re.findall(r"(\d\d)", "4 four 25 twenty five 12 twelve")
['25', '12']
Observation:
1. Our RE contains 1 group: (\d\d).
2. There are 2 matches for the group: 25 and 12
3. The function returns a list of the content obtained that matched the groups.
Another example:
>>> import re
>>> re.findall(r"\d(\d)", "4 four 25 twenty five 12 twelve")
['5', '2']
Observation:
1. The RE matches 2 consecutive digits, but the second digit is a member of a
group.
2. The list returned contains values that matched the group, not the entire RE.
Behaviour #3: If the RE contains more than 1 group, this function returns a list of
tuples – each tuple for 1 match containing the content that matched each group:
9. Regular Expressions 251
>>> import re
>>> re.findall(r"(\d)(\d)", "4 four 25 twenty five 12 twelve")
[('2', '5'), ('1', '2')]
Observation:
1. The RE contains 2 groups.
2. Each element of the list returned contains a tuples with 2 values.
3. The list contains as many elements as the number of matches.
4. Each tuple contains as many values as the number of groups.
Example:
>>> import re
>>> for m in re.finditer(r"(\d)(\d)", "4 four 25 twenty five 12
twelve"):
... print(m.groups())
...
('2', '5')
('1', '2')
Observation:
1. Since finditer() returns in iterator, we can directly iterate through the
result using a loop construct as shown.
2. Each iteration provides a single match object corresponding to a match
found. Methods of match instance were dealt with in section 10.6.
Forms:
1. re.split(pattern, string)
2. re.split(pattern, string, parts)
3. re.split(pattern, string, parts, flags)
>>> import re
>>> re.split(r"\d\d", "4 four 25 twenty five 12 twelve")
['4 four ', ' twenty five ', ' twelve']
Observation:
1. The RE for the split specifies that splitting should occur wherever 2
consecutive digits are present - “25” and “12” in our example.
2. After splitting, there are 3 parts which are returned as a list of strings.
If the RE pattern contains any groups, then the contents of the groups are also
returned in-between the split parts:
>>> import re
>>> re.split(r"(\d)(\d)", "4 four 25 twenty five 12 twelve")
['4 four ', '2', '5', ' twenty five ', '1', '2', ' twelve']
>>> import re
>>> re.split(r"\d\d", "4 four 25 twenty five 12 twelve", 1)
['4 four ', ' twenty five 12 twelve']
9. Regular Expressions 253
Observation:
1. We have asked for 1 split and therefore the splitting process stops after 1
split.
2. The remainder string after splitting is also returned. Thus, if we ask for n
splits, we will get n split strings and the remainder strings – totally a list of n+1
strings.
Forms:
1. re.sub(pattern, replacement, string)
2. re.sub(pattern, replacement , string, count)
3. re.sub(pattern, replacement , string, count, flags)
4. re.sub(pattern, function, string)
5. re.sub(pattern, function, string, count)
6. re.sub(pattern, function, string, count, flags)
254 9. Regular Expressions
>>> import re
>>> re.sub(r"\d+","'n'","There are 5 apples and 3 mangoes")
"There are 'n' apples and 'n' mangoes"
Observation:
1. The original string argument, string, is not changed by this function. The
new string with replacements made is merely returned.
We can also use back-references (\1, \2, etc.) to use content that matched in the RE
within the replacement string:
>>> import re
>>> re.sub(r"(\d+)/(\d+)/(\d{4})", r"\2-\1-\3", "25/12/2016")
'12-25-2016'
Observation:
1. We are attempting to match a UK-format date and convert it into a US-format
equivalent.
2. Since the replacement string contains backslashes, either we should prefix
each backslash with another backslash or use raw strings – we prefer the
latter for simplicity and readability.
3. The back-reference \1 refers to the content that matched the first group, \2
for content that matched the second group and so on.
>>> import re
>>> re.sub(r"\d+","'n'","There are 5 apples and 3 mangoes", 1)
"There are 'n' apples and 3 mangoes"
>>> re.sub(r"\d+","'n'","There are 5 apples and 3 mangoes", 5)
"There are 'n' apples and 'n' mangoes"
>>> re.sub(r"\d+","'n'","There are 5 apples and 3 mangoes", 0)
"There are 'n' apples and 'n' mangoes"
9. Regular Expressions 255
Observation:
1. In the first example, only the first occurrence of a sequence of digits was
replaced.
2. In the second example, while we have asked for maximum 5 replacements,
there were only 2 occurrences of the pattern and both were replaced.
3. The third example is identical to the example we saw for form #1.
>>> import re
>>> def f(m):
... if int(m.string[m.start():m.end()])<10: return "few"
... else: return "many"
...
>>> re.sub(r"\d+",f,"There are 15 apples and 3 mangoes")
'There are many apples and few mangoes'
Observation:
1. Note that when we are invoking the re.sub() function, we specify f and not
f() - in other words, we are providing the function name and not calling the
function ourselves.
2. The function f will receive a match object m each time a match is found.
Recall from section 10.6 that m.string gives us the original string that was
searched, m.start() gives us the starting index within the string where the
match was found and m.end() gives is the ending index within the string
where the match was found. Thus, m.string[m.begin():m.end()]
identifies the substring that was matched within the original string.
3. The function f converts the matched substring into an integer and compares it
with 10 – if less than 10, the function returns the string "few", else it returns
the string "many".
256 9. Regular Expressions
>>> import re
>>> def f(m):
... if int(m.string[m.start():m.end()])<10: return "few"
... else: return "many"
...
>>> re.sub(r"\d+",f,"There are 15 apples and 3 mangoes", 1)
'There are many apples and 3 mangoes'
Observation:
1. We have asked for maximum 1 replacement and hence the number “3”
remains in the original string.
2. This form is similar to form #2, but uses a replacement function instead of a
replacement string.
3. As discussed in form #2, if the count is 0, this will work the same way as
form #4 and replaces all occurrences of the pattern pattern in the string
string.
The re.subn() is a very similar function that works the same way as re.sub()
does, but returns a tuple containing the replaced string as well as the total number of
substitutions performed:
Example:
>>> import re
>>> re.subn(r"\d+","'n'","There are 5 apples and 3 mangoes")
("There are 'n' apples and 'n' mangoes", 2)
9. Regular Expressions 257
re.compile(pattern, flags=0)
Forms:
1. re.compile(pattern)
2. re.compile(pattern, flags)
>>> import re
>>> c=re.compile(r"\d\d")
Once we have access to a regex object, these are the methods we can invoke on it:
Observation:
1. These methods behave identical to the corresponding functions in the re
module.
2. Most of these methods receive optional pos and endpos parameters – the
pos parameter specifies where to start matching within the string, and the
endpos parameter specifies where to stop matching in the string. When pos
is omitted, the matching starts at the beginning of the string and when
endpos is omitted, the matching ends at the end of the string. Note that this
has no effect on the “^” and “$” operators which will still consider the start of
the string to the be index 0 regardless of pos and the end of the string to be
the last index of the string regardless of endpos.
3. The RE is not present in these methods since it is already placed within the
regex object. The string against which the RE has to be matched can change
and is hence passed as a parameter.
4. The flags are not present as parameters in these methods since they are
already placed within the regex object. If a different set of flags are required,
another regex object instance has to be created for that using
re.compile().
9.11 Questions
1. Explain regular expression operators with examples.
2. Write a short note on anchoring operators of regular expressions.
3. Differentiate between the match(), fullmatch() and search() functions
of the re module.
4. Write a short note on flags that can be used while dealing with regular
expressions.
5. Briefly explain how we can work with match objects in Python.
9. Regular Expressions 259
9.12 Exercises
1. Write a Python script to check if a given string contains an email address or
not.
2. Write a Python script that accepts a line of CSV (Comma Separated Values)
and checks if the 3rd field contains exactly an email address or not, given that
there are more than 3 fields.
3. Write a Python script that scans through a given piece of text and extracts all
unique email addresses from it.
4. Write a Python script that reads in a piece of text and prints it out masking out
email addresses. Thus, and email address “[email protected]”
should become “h********[email protected]”.
260 9. Regular Expressions
SUMMARY
➢ Regular Expressions are pattern matching expressions.
They help us formally frame patterns of strings we are
interested in searching.
➢ The split() function can be used to split a string into parts based
on a RE for specifying the delimiter string.
10 FUNCTIONS
FUNCTIONS
10.1 Introduction to Functions
Functions (loosely equivalent to subroutines/procedures, all of which are types of
subprograms) are self-contained blocks of code to perform a specific task. Functions
are used for the following reasons:
1. They help reduce code redundancy: when the same piece of code is
required in multiple places within a program, the piece of code can be
converted into a function, which can then be called whenever and wherever
required within the program.
2. They help increase code reusability: a function once defined can be called
any number of times and can even be called from other programs (see
section 15 for a discussion of modules).
3. They help improve program clarity: putting all instructions in one place
results in cluttering, making the program difficult to read, understand and
maintain; whereas organising them into functions makes it far more
manageable.
Functions, being subprograms, can assist the main program in it's activities. During
execution of a Python script, control always starts flowing from the first line of the
main program (the part that is outside of all functions) and flows down sequentially
line by line by default, executing all the lines in a sequence. This flow can be altered
by certain programming constructs like decisions and loops, and is also altered when
function calls are made. Thus, a function defined in a program will not be executed
unless explicitly called from the main program. Of course, a function that is called can
then call other functions as well!
def function_name(parameters):
function_body
...
The def keyword specifies that we are defining a function. The function name should
be a valid identifier. A function can optionally receive parameters, different forms of
which will be covered in the next few sections. The function body is a sequence of
instructions and can span any number of lines (including blank lines). As is the case
10. Functions 263
everywhere in Python, the indentation level decides which statements belong to this
function.
Here is the definition of a function called hi that prints “Hello World” when called:
>>> hi()
Hello World
NOTE:
In an interactive Python session, once a function has been defined, it can be called
any number of times without requiring a redefinition.
Functions once defined can be redefined later to perform a different task. The latest
definition seen will be used when a function call is executed:
A function name without the parentheses is a reference to the function and as such
can be stored in variables and used later as a replacement for the original function
name:
264 10. Functions
Observation:
1. The values sent to the function at the place of call (2 and 3) are arguments to
the function.
2. The variables in which values are received by the function in it's definition (x
and y) are parameters.
3. At the time of function call, the argument 2 is sent to the parameter x and the
argument 3 is sent to the parameter y.
10. Functions 265
Observation:
1. The syntax of default arguments is that a value is assigned to the parameter.
This is the default value which will be used when the actual argument is
missing in the call.
2. A positional parameter cannot follow a default argument. Hence default
arguments are always at the end of the parameter list.
3. The default value need not be used and default arguments can be treated as
normal arguments as shown in the example f(10,20,30,40).
4. When default arguments are omitted at the place of call, the default value is
assumed - like how 3 is assumed to be the value of d in f(10,20,30).
5. When multiple default arguments are present and some are omitted, it is
assumed that the omission is from right to left. Thus, in the call
f(10,20,30), it is assumed that d is omitted and the value of c is 30.
Practically, we would put the default argument that is most omitted at the
266 10. Functions
rightmost end of the parameter list. (Section 10.6.2 shows how we can
explicitly override a default argument without providing any value for previous
default arguments).
6. The positional arguments cannot be omitted, as shown in f(10).
Observation:
1. In the above example, the si() function is designed to receive arguments for
principal, time duration and rate of interest – in that order.
2. The example si(1000,3,9.5) passes 1000 as p, 3 as t and 9.5 as r.
3. The example si(p=1000,r=9.5,t=3) passes 1000 as p, 3 as t and 9.5
as r. Note that the order of arguments given here do not match the order of
parameters, but since we have explicitly identified the parameter by it's name,
there is no confusion.
4. We can also mix positional parameters with keyword arguments as shown in
the example si(1000,r=9.5,t=3).
10. Functions 267
NOTE:
A keyword argument cannot be followed by a non-keyword argument.
Observation:
1. In the first example, f(10,20,30), 30 is assumed to be the value of c and d
is assumed to have been omitted and thus assumed to be the default value 3.
2. In the second example, f(10,20,d=30), the keyword argument d=30
explicitly states that the value of d should be 30. The value of c is assumed to
have been omitted and thus assumed to be the default value 2.
NOTE:
A keyword argument cannot be followed by a non-keyword argument.
It is however possible to invent additional keyword arguments that are not even
present in the formal parameter list! These additional keyword arguments can then be
received as a dictionary, the keys of which are parameter names and values are
argument values. The special syntax ** prefix is used for this purpose as shown in
the example below:
Observation:
1. The function f receives 2 positional parameters a and b and any number of
keyword arguments.
2. In the call f(1,2,c=2,d=3), the values 1 and 2 are passed to the positional
parameters a and b respectively; whereas 2 keyword arguments c and d are
created with values 2 and 3 respectively.
3. The additional keyword arguments are stored in a dictionary and passed as
the value for the dictionary parameter x. Recall from section 7.1 that the order
of the keys in the dictionary (in this case, the keys are the parameter names)
are not under our control.
4. The dictionary contains those parameters which are not present in the
parameter list. Thus, in the example f(1,c=2,b=5,d=3), the parameter b
will not be found in the dictionary.
10. Functions 269
NOTE:
A keyword argument cannot be followed by a non-keyword argument.
Observation:
1. The special * prefix is used to denote that we are dealing with a variable
arguments.
2. These variable arguments are packed together into a tuple that is accessible
as a single parameters in the function – in our example, x.
3. A simple for loop can help iterate through all the arguments passed, in order.
Observation:
1. In this function, a and b are positional parameters. It is mandatory to provide
arguments for these or else we will get an error as demonstrated in the
example sum().
2. The variable s is initialized to the sum of a and b instead of 0 as was the case
in the previous example.
The * parameter (x in our previous example) is greedy and will take up all remaining
arguments. Therefore, we will never provide positional parameters after variable
arguments, as demonstrated by this example:
>>> sum(2,a=3,b=4)
9
NOTE:
A keyword argument cannot be followed by a non-keyword argument.
10. Functions 271
Observation:
1. In the above example, a is a positional parameter and is mandatory, b is a
default argument and is optional, x is the collection of variable arguments and
is optional. Thus, at least 1 argument in the function call is compulsory.
2. When only 1 argument is provided, the argument is considered to be the
value of the positional parameter a and parameter b takes on the value 100.
3. When 2 arguments are provided, they are assumed to be the values of a and
b respectively and x is empty.
4. When more than 2 arguments are provided, the first 2 arguments are
considered to be the values of a and b respectively and the rest will be
present in x in the same sequence as in the call.
Of course, the positional parameter and default argument can be provided as keyword
arguments if required as shown in the examples below:
>>> sum(b=2,a=5)
7
>>> sum(7,b=2)
9
>>> sum(3,a=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() got multiple values for argument 'a'
272 10. Functions
Observation:
1. If the first argument is provided, it is assigned to the positional parameter a.
Reassignment using keyword arguments is not permitted.
>>> sum(3,5,b=4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() got multiple values for argument 'b'
Observation:
1. If a second argument is provided, it is assigned to the parameter b.
Reassignment using keyword arguments is not permitted.
>>> print(1,2,3,sep=':',end='\n*****\n')
1:2:3
*****
Here is a function sum that receives variable arguments along with a default argument
following it:
Observation:
1. In the example sum(2,3,4), all the arguments go into x, leaving negate
with the default value, False.
2. In the example sum(2,3,4,negate=True), the keyword argument
negate=True ensures that the parameter negate takes on the value True.
All other arguments are stored in x.
3. In the example sum(2,3,4,True), all arguments are stored in x. Thus, the
final argument True is also a part of x and counted as 1 during addition.
Specifically, the value True is not taken to be the value for negate.
return [value]
Forms:
1. return
2. return value
Form #1 will be covered in this section whereas form #2 will be covered in the next
section.
Observation:
1. The function hi did not end up printing the string “Heya!” since the return
statement preceding it resulted in a transfer of control back to the caller.
2. Unconditional return statements such as the one shown in this example are
practically useless as the rest of the statements never get executed. It is
common for a function to have multiple return statements, but in such cases
the return statement would be conditional – probably present within an if
statement's body.
Observation:
1. In the above examples, each call to the function hi will anyway print “Hello
World” apart from anything else that we have given.
2. There is a lot of difference between hi and hi(): hi is a reference to the
function whereas hi() is a function call that results the function getting
executed and it's return value getting substituted at the place of call!
But what is practically more useful however is that we can return values from
functions. The return statement also permits the optional return of a single value
that will be substituted at the place of call.
We have dealt with the sum function many times before, but each time it was printing
the result on the standard output. This time, we will modify it to return the sum instead
of printing it:
As was discussed in the previous section, there can be multiple such return
statements within a function, but the moment one of them is executed, control is
returned to the caller and the return value is substituted at the place of call.
Let us apply this by writing a program to find the GCD of 2 integers using a function
called gcd which returns the result to the main script.
276 10. Functions
gcd.py
1. #!/usr/bin/python
2.
3. # Program to find the GCD of 2 integers using functions
4.
5. def gcd(x,y):
6. if x==y: return x
7. elif x>y: return gcd(x-y,y)
8. else: return gcd(x,y-x)
9.
10. x,y = input("Enter 2 integers:").split(' ')
11. x,y = int(x),int(y)
12.
13. print("The GCD of {} and {} is {}".format(x,y,gcd(x,y)))
Output:
Enter 2 integers:36 60
The GCD of 36 and 60 is 12
Observation:
1. The gcd function is defined in line 5. A function definition should always
precede function calls.
2. The GCD is found by recursively subtracting the smaller integer from the
larger integer till the 2 integers become equal, at which point they converge at
the GCD. Other methods of finding the GCD can also be employed.
3. Line 10 reads in a single line as a string. This line contains 2 integers and
hence the split() function has been used to split the string at the space to
give us the 2 parts. The two parts are stored in x and y respectively, but are
still strings and not integers.
4. Line 11 converts the strings in x and y into integers and stores them back into
the same variables.
5. Line 13 calls the gcd function passing x and y as arguments and the returned
value is printed as part of the formatted string.
10. Functions 277
1. #!/usr/bin/python
2.
3. # Program to classify a given number
4. # as prime or composite using functions.
5.
6. def isPrime(n):
7. for i in range(2,int(n/2)+1):
8. if n%i==0: return False
9. return True
10.
11. n = int(input("Enter a positive integer: "))
12.
13. if isPrime(n): print("Prime")
14. else: print("Composite")
Observation:
1. The main program starts from line 11 and uses the isPrime() function to
determine if the given number is prime or not.
2. The isPrime() function is loosely based on the code of prime3.py, with
optimizations to reduce code.
3. The int() function is used in line 7 to prevent floating point results due to
division. The +1 is provided because the range function excludes the ending
value, and we need the last value to be n/2.
4. If we find that the number has a factor (and hence is composite) inside the
loop (in line 8), we immediately return False to indicate that the number is
not prime. No other statements will be executed within the loop or the function
and control returns to the main program (in line 13).
5. If control comes out of the loop, it is an indication that the number is prime
and we unconditionally return True to indicate the same.
Of course, we do not necessarily need the temporary variable t in the function, and
the function can as well be defined as follows:
Of course, we do not necessarily need the temporary variable l in the function, and
the function can as well be defined as follows:
10. Functions 279
Let us write a program that uses a function to generate all prime factors of a given
integer. We can then find the prime factors of 2 integers and use this information to
generate the GCD of the 2 integers (which will be the product of the common prime
factors):
gcd2.py
1. #!/usr/bin/python
2.
3. # Program to find the GCD using functions
4. # to find the prime factors
5.
6. def getPrimeFactors(n):
7. factors = []
8. factor=2
9. while n>1:
10. while n%factor == 0:
11. factors.append(factor)
12. n /= factor
13. factor = factor+1
14. return factors
15.
16. def gcd(x,y):
17. factors1 = getPrimeFactors(x)
18. factors2 = getPrimeFactors(y)
19. gcd=1
20. for i in factors1:
21. if i in factors2:
22. factors2.remove(i)
23. gcd *= i
24. return gcd
25.
26. x,y = input("Enter 2 integers:").split(' ')
27. x,y = int(x),int(y)
28. print("The GCD of {} and {} is {}".format(x,y,gcd(x,y)))
280 10. Functions
Output:
Observation:
1. The getPrimeFactors() function defined in line 6 returns a list containing
all the prime factors of a given integer. For example, the prime factors that
when multiplied gives us 12 are 2, 2 and 3.
2. The gcd() function in line 16 finds and returns the GCD of 2 integers. It first
finds the prime factors of both the integers using getPrimeFactors(). For
each prime factor of the first number, it checks if there is an identical prime
factor of the second number. If so, it removes it from the list and multiplies it to
the GCD. Finally, the GCD is returned.
Of course, we do not necessarily need the temporary variable d in the function, and
the function can as well be defined as follows:
>>> x=2
>>> def f():
... x=3
... print(x)
...
>>> print(x)
2
>>> f()
3
282 10. Functions
>>> print(x)
2
Observation:
1. The line x=2 introduces a global variable with name x and value 2.
2. The function f has a reference to the variable x, but this is assumed to be a
local variable x that is local to the function f. This variable is created when
the function is called and is destroyed when the function returns. During the
lifetime of the function, however, it exists with a value of 3.
3. Any reference to a variable inside functions is assumed to be a local
reference by default. Any reference to a variable outside functions is assumed
to be a global reference.
4. Any reference to x outside of all functions is a reference to the global variable
x, with the value 2. Any reference to x inside the function f is a reference to
the local variable x, with the value 3.
What if in the previous example, we want the function to use the global variable x
instead of creating a new local variable? We make this intention known to Python
using the keyword global to seek access to one or more global variables as shown
in the syntax below:
global name1[,name2[,...]]
>>> x=2
>>> def f():
... global x
... x=3
... print(x)
...
>>> print(x)
2
>>> f()
3
>>> print(x)
3
10. Functions 283
Observation:
1. The statement global x makes it possible for the function f to access the
global variable x, which otherwise would have construed to mean a local
variable x.
2. A function that needs access to a global variable has to compulsorily use the
global statement to request access.
3. Once the global statement is used to request access to a global variable, it
is not possible to have a local variable with the same name.
4. A function can request access to multiple global variables simultaneously
using either a single global statement and a list of all global variables or
using multiple global statements.
Observation:
1. The function f is the outer function and the function g is the inner function.
2. As is the case everywhere else in Python, encountering a function definition
does not automatically result in a function call. An inner function is useless
unless used in some way. The typical ways of use are by making a call (as
done in this example) or by returning it (as will be done in the following part).
3. The function g belongs to the scope of the function f (g is local to f) and is
hence accessible only by f. Any attempt to access it from outside f, as shown
in the example, will result in an error.
Here are the reasons why we might want to deal with inner functions, with each
category of application covered in detail in following sections:
1. Inner functions do not pollute the global namespace. Thus, if the functionality
provided by the inner function is known to benefit only the outer function, it
makes sense to “hide” the inner function in the outer function without polluting
the global namespace. This also makes it possible to have multiple [inner]
functions with the same name since they are all in different scopes!
2. Sometimes, the outer function is simply a wrapper around the inner function
with the main work being done by the inner function. The existence of the
outer function could be to perform validation (verifying that the inputs to the
function are acceptable), logging(recording the call in some file or data
structure), choosing(selecting one inner function from amongst several), etc.
3. The outer function can also act like a generator – a function that helps
manufacture a suitable (inner) function and returning that function for external
use!
1. #!/usr/bin/python
2.
3. # Program to find the GCD using functions
4. # to find the prime factors, making use of inner
functions.
5.
6. def gcd(x,y):
7. def getPrimeFactors(n):
8. factors = []
9. factor=2
10. Functions 285
Output:
Observation:
1. This program is identical to gcd2.py, except for the fact that the
getPrimeFactors() function is now defined inside the function gcd, in line
7.
2. In gcd2.py, the getPrimeFactors() was a global function, accessible to
any other function defined after it, but in this program it is a local function,
accessible only to the function gcd().
prime8.py
1. #!/usr/bin/python
2.
3. # Program to classify a given number
4. # as prime or composite using functions.
5. import sys
6. def isPrime(n):
7. def implIsPrime(n):
8. for i in range(2,int(n/2)+1):
9. if n%i==0: return False
10. return True
11.
12. if n<0: print("Negative integer specified!")
13. elif n==0: print("Zero given!")
14. elif n==1: print("1 is neither prime nor composite!")
15. else: return implIsPrime(n)
16. sys.exit()
17.
18. n = int(input("Enter a positive integer: "))
19.
20. if isPrime(n): print("Prime")
21. else: print("Composite")
Output:
Observation:
1. The original code of isPrime() from prime7.py is now present in
implIsPrime() inside isPrime() in this program.
2. The isPrime() function here checks the input given and validates it. If the
given input is not acceptable, it prints a suitable message. If not, it uses the
implIsPrime() function (the actual implementation of isPrime) to
determine whether the integer is prime or not and returns the result. In case
the input is not acceptable, it uses the sys.exit() function to terminate the
script rather than returning to the caller.
3. A more professional approach of handling unacceptable input is by raising
exceptions, a concept that is covered later in section 13.
generator.py
1. #!/usr/bin/python
2.
3. # Program to demonstrate outer functions
4. # as generators using inner functions.
5. import math
6.
7. def power(n):
8. def implPower(x):
9. return int(math.pow(x,n))
10. return implPower
11.
12. square=power(2)
13. cube=power(3)
14.
15. print("Equation: ax^3 + bx^2 + cx + d")
16. a,b,c,d = input("Enter the values of a, b, c, d:").split('
')
17. a,b,c,d = int(a),int(b),int(c),int(d)
18. x = int(input("Enter the value of x:"))
19. result=a*cube(x) + b*square(x) + c*x + d
20. print("Result:",result)
288 10. Functions
Output:
Observation:
1. The power() function is a generator function that returns a reference to
function! Thus, line 12 creates a function and stores it's reference in the
variable square, and line 13 similarly stores a function reference in the
variable cube.
2. The variables square and cube can now be used as regular functions since
they are references to functions. They are used in line 19.
3. The power function returns implPower, which is a reference to a function.
4. The implPower function is an inner function of power and uses the local
context of power. The local variable n of power can be and is used in the
inner function implPower. Thus, there can be as many unique versions of
implPower as unique values of n passed to power()!
5. Thus, the reference in square is a reference to the implPower() function
with n=2 already substituted. Similarly in the case of cube, n=3.
6. For simplicity, we are dealing with integers here. This justifies the int() call
in lines 9, 17 and 18.
7. We are using the pow() function of the math module in line 9. Hence we are
importing the math module in line 5.
A lambda can receive 0 or more parameters (just like functions) and end up returning
the value of the expression on execution. Here is the first example:
Observation:
1. We have created a lambda function using the keyword lambda. This function
accepts a single parameter x and returns it's square.
2. We wish to execute the lambda function immediately with 4 as an argument.
We therefore use parentheses to represent the function reference returned by
the lambda keyword and pass 4 as an argument to it.
3. Even though this example is valid, it is useless practically as we could have
as well directly used the expression 4**2 instead. But this definitely serves
as a first valid example.
Since the keyword lambda manufactures a function and returns it's reference, we can
store the function reference in any variable and then use that variable like a function.
This is illustrated below:
Observation:
1. We have created the same lambda expression as in the previous example,
but this time have stored it in the variable square.
2. We then use square to invoke the functionality whenever we want, how
many ever times we want and with whatever arguments we want.
3. The use of lambda this way is again not very practical as we could have as
well defined a function called square that works in a similar fashion. But this
example illustrates the similarity in handling function references.
Lambda expressions are especially useful when function objects are required. A
function object is a reference to a function that can be stored (in variables), passed
around in the script (as arguments to functions and return values from functions) and
can be invoked whenever required. We will see examples of this in section 11.6.
290 10. Functions
Observation:
1. The function f is designed to receive a single parameter.
2. We are passing a single tuple containing 4 values: (2,4,6,8).
3. The function receives the entire tuple (2,4,6,8) as a single collection in a.
This behaviour is useful when we specifically want a function to receive collections
like tuples, lists, sets and dictionaries. But what if we have multiple values already
stored in a collection and we wish to send them as individual arguments to a function?
Recollect the calc function discussed in section 10.10.1:
Given 2 values, this function will return as tuple containing their sum, difference,
product and quotient. If we have the 2 input values already in a list, we will not be able
to directly pass the list as an argument as it will be considered to be a single list
argument, as illustrated below:
*collection
**dictionary
Note:
1. The *collection syntax is used for tuples, lists and sets whereas the
**dictionary syntax is used specifically for dictionaries.
2. The *collection syntax expands the collection into arguments that can
either map on to corresponding parameters of a function or can be received
as variable arguments using the *args syntax discussed in section 10.7.
3. The **dictionary syntax expands the dictionary into keyword arguments
that can either map on to corresponding named parameters of a function or
can be received as additional keyword arguments using the **kargs syntax
discussed in section 10.6.
This is illustrated in the revised examples below:
There are certain conventions recommended for forming good documentation strings:
1. The first line should be a 1-line summary of the function's purpose. This line
should typically start with an uppercase alphabet and end with a period,
avoiding the function name (unless it is being used as an English word). This
line must be indented as the rest of the function body is.
2. The first line (summary) is separated from the rest of the lines (description) by
a blank line.
3. The description can span any number of lines and can provide additional
information about the function and it's specialities. These are generally
indented as the rest of the function body is (but indentation is not
compulsory). Any spaces and tabs used for indentation is retained as part of
the documentation string by the Python parser (but other tools can be
designed to ignore the indentation)
Here is a revised isPrime() function definition, modified from prime7.py:
Since the Python parser parses through the documentation string, we can always
access it using the special member __doc__ of the function as shown below:
>>> isPrime.__doc__
'Returns whether n is prime or not.\n\n\tGiven a non-negative integer
n, this function\n\treturns True if n is prime and False
otherwise.\n\tThe argument n is assumed to be a positive
integer\n\tgreater than 1 (since 1 is neither prime nor
composite).\n\tNo argument validation is performed.\n\t'
Observation:
1. We are using the function reference (isPrime) to access the member
__doc__. Note that there are no parentheses after the function name,
preventing it from becoming a function call.
10. Functions 293
2. Note that every whitespace used for indentation is retained within the string
by the Python parser.
Not only that, we can access the documentation string using the built-in help()
function in the Python interpreter, as shown in the examples below:
>>> help(isPrime)
Help on function isPrime in module __main__:
isPrime(n)
Returns whether n is prime or not.
>>> help(range)
Help on class range in module builtins:
class range(object)
| range(stop) -> range object
| range(start, stop[, step]) -> range object
|
| Returns a virtual sequence of numbers from start to stop by step.
|
| Methods defined here:
|
| __contains__(...)
| x.__contains__(y) <==> y in x
|
| __eq__(...)
| x.__eq__(y) <==> x==y
|
| __ge__(...)
| x.__ge__(y) <==> x>=y
|
| __getattribute__(...)
| x.__getattribute__('name') <==> x.name
|
| __getitem__(...)
| x.__getitem__(y) <==> x[y]
|
| __gt__(...)
| x.__gt__(y) <==> x>y
|
| __hash__(...)
| x.__hash__() <==> hash(x)
|
| __iter__(...)
294 10. Functions
combination.py
1. #!/usr/bin/python
2.
3. # Program to find the combination of 2 integers using
functions.
4.
5. from math import factorial
6.
7. def ncr(n,r):
8. return int(factorial(n)/(factorial(r)*factorial(n-r)))
9.
10. x,y = input("Enter 2 integers:").split(' ')
11. x,y = int(x),int(y)
12. print("The combination of {} and {} is
{}".format(x,y,ncr(x,y)))
Observation:
1. Line 10 receives 2 values from the user in a single line, using the split()
function to split the line into 2 values at the space.
2. Line 11 converts the strings in x and y into integers and stores them back into
the same variables.
3. The combination is found using the ncr function defined in line 7. This
function uses the built-in factorial function from the math module.
4. The math.factorial function becomes directly available in the program
due to the import in line 5.
10.17 Questions
1. List the advantages of functions.
2. Write short notes on positional arguments and default arguments.
3. Write short notes on function definition and function call in Python.
4. Write a short note on keyword arguments.
5. Explain how variable arguments can be processed by a function in Python.
6. What will be the return value of a function in Python if no return statement is
present in the function?
7. Explain 2 different ways in which a function can return multiple values back to
the caller.
8. Can a nested function be called from the global scope in Python? Elaborate.
296 10. Functions
9. What are lambda expressions in python? Explain with syntax and examples.
10.18 Exercises
1. Write a program using functions to check if a given number is an ugly number.
Ugly numbers are positive numbers whose only prime factors are 2, 3 or 5.
2. Write a function in Python to calculate and return the average of a given list of
integers.
3. Write a function in Python called isIn() to check for the existence of a
substring within a string returning a Boolean value.
4. Write a function in Python to demonstrate keyword arguments.
5. Write a script in Python to demonstrate nested functions.
6. Write a lambda expression to convert a given angle from degrees to radians.
7. Write a function to find the distance between 2 points, where each point is
passed as a tuple to the function.
10. Functions 297
SUMMARY
SUMMARY
11 PRACTICAL PYTHON
PRACTICAL PYTHON
11.1 Implementing Stacks
A stack is a linear data structure that follows the LIFO (Last In First Out) principle.
Thus, the last element added into a stack is the first element to get deleted. The
insertion operation is called a push operation whereas the deletion operation is called
a pop operation.
There is no direct support for stacks in Python (meaning there is no class or data type
called stack), but stacks can be housed in suitable containers and operated upon.
Tuples and Lists are two containers that are linear, but tuples being immutable cannot
be used to store a stack as the stack contents should be able to change with time.
Lists therefore are the best choice to house a stack.
A stack has two ends: a bottom end which is fixed and from where the stack grows
and a top end which is dynamic and where push and pop operations take place.
This functionality can be implemented using a list in this manner:
1. The push operation can be performed using list.append() which appends
an element at the end of the list (the top end of the stack).
2. The pop operation can be performed using the list.pop() function which
deletes the last element in the list (from the top of the stack).
3. The peek operation, which provides the last element of the stack (the topmost
element) without deleting it from the stack, can be implemented as list[-
1].
4. The number of elements in the stack can be found out using len(list).
This can also be used to find if the stack is empty or not.
5. The stack can be cleared using the list.clear() function.
6. More functionality is possible that typical stacks may not provide, like
searching for an element in the stack, appending stacks, etc.
Here is a simple program to demonstrate basic stack operations:
stackdemo.py
1. #!/usr/bin/python
2.
3. # Stack demonstration using lists
4.
5. S = [] # Creation of a stack S
6. S.append(10) # Push 10
7. S.append(20) # Push 20
11. Practical Python 301
8. S.append(30) # Push 30
9. print(S) # Display the stack
10.
11. print(S.pop()) # Pop 30
12. print(S) # Display the stack
13. print(S[-1]) # Peek
14.
15. print(len(S))# Stack size
16. print(len(S)==0)# Stack empty?
Output:
a queue. The index() function searches for the first occurrence of the given
element within the queue (or within a region of the queue).
Here is a simple program to demonstrate basic queue operations:
queuedemo.py
1. #!/usr/bin/python
2.
3. # Queue demonstration using collections.deque
4.
5. from collections import deque
6.
7. q = deque() # Creation of a queue
8. q.append(10) # Insert 10
9. q.append(20) # Insert 20
10. q.append(30) # Insert 30
11. print(q) # Display the queue
12.
13. print(q.popleft()) # Delete 10
14. print(q) # Display the queue
15. print(q[0]) # Peek
16.
17. print(len(q)) # Queue size
18. print(len(q)==0) # Queue empty?
Output:
map(func,sequence)
11. Practical Python 303
Each element in sequence is sent to the function func as an argument, and the
return value from the function is sent to an output sequence that is finally returned by
map().
Here is an example of using map to multiply each element of a list by 2 and obtain a
new list with the corresponding results:
Observation:
1. Our input list is L, comprising of the elements 1, 6, 4, 9 and 7.
2. We define a mapping function called f, which returns 2 times the value of the
original argument.
3. We use the map() function passing a reference to our mapping function(f),
as the first argument and our input list(L) as the second.
4. The map() function will map each element of L using f for individual element
conversion and returns the final result as an iterable object, which we convert
to a list for convenience.
5. We finally observe that the list M contains elements with values twice of those
elements that were present in the list L, and in the same order as found in L.
>>> L1=[1,2,3]
>>> L2=[4,5,6]
>>> L3=[7,8,9]
>>> def f(a,b,c): return a+b+c
...
>>> M=list(map(f,L1,L2,L3))
>>> M
[12, 15, 18]
Observation:
1. The map() function is receiving the mapping function reference (f) and 3 lists
(L1, L2 and L3).
2. The mapping function (f) therefore receives 3 arguments (a, b and c) – one
per list. The function f returns the sum of it's arguments.
3. The number of elements in the output sequence is the same as the number of
elements in the other input lists. If the input lists did not all have the same
number of elements, the smallest list is considered and the additional
elements of the other lists are ignored. The length of the output list, therefore,
is the same as the length of the smallest input list.
filter(func,sequence)
>>> L=[1,6,4,9,7]
>>> def f(x): return x%2==0
...
>>> M=list(filter(f,L))
>>> M
[6, 4]
Observation:
1. Our input list is L, comprising of the elements 1, 6, 4, 9 and 7.
2. We define a filtering function called f, which returns True only if it's argument
is even.
3. We use the filter() function passing a reference to our filtering
function(f), as the first argument and our input list(L) as the second.
4. The filter() function will filter through the elements of L using f for each
element of L and returns the final result as an iterable object, which we
convert to a list for convenience.
5. We finally observe that the list M contains only the even elements that were
present in the list L, and in the same order as found in L.
>>> L=[1,0,6,4,0,9,7]
>>> M=list(filter(None,L))
>>> M
[1, 6, 4, 9, 7]
For integers, 0 is considered False whereas any other integer value is considered
True. Therefore, in the above example, we observe that the occurrences of 0 are
filtered out.
306 11. Practical Python
The second special case arises when we want the reverse – we want to filter out
those elements of the sequence that evaluate to Boolean False. This can be done
using itertools.filterfalse() function, which is the negated form of
filter() as demonstrated below:
Observation:
1. Unlike the filter() function which is a built-in, the filterfalse function
is present in the module itertools and hence needs to be imported.
2. When the filtering function reference (the first argument to filterfalse) is
not None, this function filters out all those elements for which the filtering
function returns True and only allows those elements to pass for which the
filtering function return False, behaving in a manner opposite of filter.
reduce(func,sequence[,initial])
Observation:
1. The reduce() function is imported from the functools module.
2. The reducing function (f) receives 2 arguments – the accumulated value and
an element from the sequence – and returns their sum.
3. The reduce function accepts a reference to the reducing function (f) and the
sequence to operate on (L)
4. Since 0 is the default initial accumulated value and is also the additive
identity, we don't need to explicitly provide any initial value.
Similarly, here is an example of using reduce to find the product of all elements in a
list:
Observation:
1. This example is similar to the previous one, except for the third argument to
reduce().
2. Since 1 is the multiplicative identity and 0 is the default initial accumulated
value, we need the initial accumulated value to be 1. This is explicitly provided
as the third argument to reduce(), failing which the product will always be 0.
308 11. Practical Python
>>> L=[1,6,4,9,7]
>>> M=list(map(lambda x: 2*x, L))
>>> M
[2, 12, 8, 18, 14]
>>> L=['Hello','World']
>>> M=list(map(lambda x: x.upper(), L))
>>> M
['HELLO', 'WORLD']
>>> L1=[1,2,3]
>>> L2=[4,5,6]
>>> L3=[7,8,9]
>>> M=list(map(lambda a,b,c: a+b+c, L1, L2, L3))
>>> M
[12, 15, 18]
To filter out the odd integers and allow only even integers of a list to pass through:
>>> L=[1,6,4,9,7]
>>> M=list(filter(lambda x: x%2==0, L))
>>> M
[6, 4]
11. Practical Python 309
1. #!/usr/bin/python
2.
3. # Program to analyse the marks of 'n' students in a class
4.
5. from functools import reduce
6.
7. marks = list( map(lambda x: int(x), input("Enter the marks
of 'n' students: ").split(' ')))
8. n = len(marks)
9.
10. maxMarks = max(marks)
11. minMarks = min(marks)
12. sumMarks = reduce(lambda x,y: x+y, marks)
13. avgMarks = sumMarks/n
14.
15. numPasses = len(list(filter(lambda x: x>35, marks)))
16. numFailures = n-numPasses
17. percentPasses = numPasses/n*100
18.
19. print("Analysed marks of {} students:".format(n))
20. print("Minimum marks: {} Maximum marks:
{}".format(minMarks,maxMarks))
21. print("Average marks: {}".format(avgMarks))
22. print("Passes: {} Failures:
{}".format(numPasses,numFailures))
23. print("Pass percentage: {}".format(percentPasses))
310 11. Practical Python
Output:
Observation:
1. This program uses all 3 functions: map(), filter() and reduce().
2. In line 7, we ask the user to enter all the marks, split it on spaces, convert the
split strings to integers using map, convert the result to a list and store it in
marks.
3. Lines 10-11 employ the built-in functions max() and min() to find the
maximum and minimum marks respectively.
4. Line 12 uses the reduce() function to find the sum of the marks present in
the list marks.
5. Line 15 filters the list marks to search for those marks that indicate a pass
using the filter() function. Of course, we are only interested in the number
of passes, which is counted by the built-in len() function.
>>> L=[1,6,4,9,7]
>>> M=[2*x for x in L]
>>> M
[2, 12, 8, 18, 14]
11. Practical Python 311
>>> L=['Hello','World']
>>> L=[x.upper() for x in L]
>>> L
['HELLO', 'WORLD']
Here is an even shorter version that does not store the result anywhere:
>>> L=[1,6,4,9,7]
>>> M=[x for x in L]
>>> M
[1, 6, 4, 9, 7]
The previous example, though not very useful, paves way to show the next syntax of
list comprehensions! Here is the replacement for the filter() function, to filter out
the odd integers and allow only even integers of a list to pass through:
>>> L=[1,6,4,9,7]
>>> M=[x for x in L if x%2==0]
>>> M
[6, 4]
And now for the combination of filter() and map() using list comprehensions: to
produce the squares of even numbers in a list:
>>> L=[1,6,4,9,7]
>>> M=[x*x for x in L if x%2==0]
>>> M
[36, 16]
312 11. Practical Python
How about using list comprehensions to produce a list of all primes less than 100?
Let us also add the constraint that we want to exclude those primes that end with the
digit 9:
Observation::
1. The parentheses and comma in (x,x) are enough for Python to understand
that we're dealing with a tuple. Each value generated in the list will be a tuple.
2. The first argument to range() is the starting number (we do not want to start
from 0).
3. The second argument to range() is always excluded (we want 10 to be
included).
Let us modify the previous example to produce a list containing tuples of the form
(x,x2):
Let us exclude both odd integers as well as multiples of 6 from the list:
Let us work with multiple variables now: say we want to generate all permutations of
coordinates of the form (x,y) with x and y being integers between 1 and 3 (both
inclusive):
In the above output, let's say we want to omit those coordinates that are of the form
(x,x):
Let us modify this to produce a list of coordinates of the form (x,y) with x ranging
from 1 to 3 and y ranging from 5 to 9:
How about generating 3D coordinates of the form (x,y,z), with x ranging from 1 to 3, y
always less than x and z always less than or equal to y?
While all the above examples were dealing with tuples, lists are no different – they just
need to be enclosed within square brackets ([]):
1 1 1 1
1 1 1 1
Observation:
1. There are 2 list comprehensions – one nested within another. The “inner” list
comprehension is used for generating the different column values within a
single row while the “outer” list comprehension is used for generating lists for
each row.
2. The inner list comprehension ([1 for j in range(4)]) generates a list
containing 4 elements of value 1 each. This is because each row of the matrix
should contain 4 columns of value 1 each.
3. The outer list comprehension ([[…] for i in range(3)]) generates a
list of 3 lists. This is because we want 3 rows and each row is a list of values.
2 2 2 2
3 3 3 3
1 2 3 4
1 2 3 4
316 11. Practical Python
Observation:
1. This example is similar to the previous one, except that we substitute the
column value inside the list instead of the row value (again using j+1 since
the range starts from 0 and we want to start from 1).
5 6 7 8
9 10 11 12
Observation:
1. It can be verified that the content of each cell can be obtained by the
arithmetic expression 4*i+j+1, where i is the row number starting from 0
and j is the column number starting from 0.
Let us extend the previous example to also find the transpose of the matrix using
nested list comprehension!
Observation:
1. The first part (creation of the matrix M and it’s display) is the previous example
that has been reused for readability.
11. Practical Python 317
2. The second part creates a matrix T that is the transpose of the matrix M and
displays it.
3. The outer list comprehension for creation of T creates 4 lists (the transpose
will have 4 rows since the original matrix had 4 columns).
4. The inner list comprehension for creation of T is for creating each row of the
transpose. Each row here will have as many columns/elements as the
number of rows in the original matrix M. Furthermore, each element of the
transpose in row i will have elements taken from column i of each row of the
original matrix.
11.9 Matrices
Matrices are heavily used in mathematics and programming, and programmers
typically consider representing matrices as 2D arrays/lists. In Python, a matrix can be
easily represented as nested lists, which we have seen in the previous section.
There are a host of operations that we would like to perform on matrices, which will be
covered in the following sections, but right now we will focus on some basic aspects of
matrices that will be required in the following sections, such as reading in matrices
from the user, printing matrices and traversing through matrix elements. All these
operations will be demonstrated by the following program:
matrixdemo.py
1. #!/usr/bin/python
2.
3. # Matrix demonstration
4.
5. m,n = map(lambda x: int(x), input("Enter the order of the
matrix: ").split(' '))
6.
7. print("Enter the matrix elements rowwise:")
8. matrix = [list(map(lambda x: int(x),input().split(' ')))
for row in range(m)]
9.
10. print("Here is the matrix of order {}x{}:".format(m,n))
11. for i in range(m):
12. for j in range(n):
13. print("{:5}".format(matrix[i][j]),end='')
14. print()
318 11. Practical Python
Output:
Observation:
1. The first step while accepting a matrix is to accept the order of the matrix. This
is done in line 5.
2. Each row of the matrix can be accepted as a single line and split into
individual elements on spaces and converted to integers, which is done in line
8 and is repeated as many times as the number of rows. All the elements are
stored as a list of lists in matrix.
3. The matrix can be displayed by iterating through all it's element using nested
loops as shown in lines 11-13.
In fact, to make the code more Pythonic, lines 11-14 can be replaced by the following
single statement:
Now that we know how to accept and display matrices and how to access their
individual elements, it is time to start performing operations on them.
11. Practical Python 319
1. #!/usr/bin/python
2.
3. # Program to find the transpose of a matrix
4.
5. m,n = map(lambda x: int(x), input("Enter the order of the
matrix: ").split(' '))
6.
7. print("Enter the matrix elements rowwise:")
8. matrix = [list(map(lambda x: int(x),input().split(' ')))
for row in range(m)]
9. transpose = [[None]*m for i in range(n)]
10.
11. for i in range(m):
12. for j in range(n):
13. transpose[j][i]=matrix[i][j]
14.
15. print("Here is the transpose of the matrix of order
{}x{}:".format(m,n))
16. for i in range(n):
17. for j in range(m):
18. print("{} ".format(transpose[i][j]),end='')
19. print()
Output:
Observation:
1. This program utilizes code from the previous program, matrixdemo.py.
2. Lines 5-8 are taken from matrixdemo.py, and help in reading in the matrix.
3. If the original matrix is of the order m x n, the transpose will be of the order n x
m. We need to create an empty matrix of this size so that we can assign
values to individual elements thereafter. This is done in line 9. The part
[None]*m creates a list of size m filled with the value None for each element.
The part for i in range(n) repeats this for n rows.
4. Lines 11-12 are similar to the part for displaying the matrix in
matrixdemo.py, but line 13 copies an element from row i column j of the
original matrix to row j column i of the transpose matrix.
5. Lines 15-19 are similar to those for displaying a matrix in matrixdemo.py.
Lines 11-13 can be replaced by the following statement, as covered in section 11.8:
matrix_diagonal.py
1. #!/usr/bin/python
2.
3. # Program to find the sum of all elements:
4. # 1. On the principal diagonal
5. # 2. Above the principal diagonal
6. # 3. Below the principal diagonal
7.
8. m = int(input("Enter the number of rows of the matrix: "))
9.
10. print("Enter the matrix elements rowwise:")
11. matrix = [list(map(lambda x: int(x),input().split(' ')))
for row in range(m)]
12.
13. sumAbove,sumBelow,trace = 0, 0, 0
14.
15. for i in range(m):
16. for j in range(m):
17. if i<j: sumAbove += matrix[i][j]
18. elif i>j: sumBelow += matrix[i][j]
19. else: trace += matrix[i][j]
20.
21. print("Sum of all elements:")
22. print("\tAbove the principal diagonal:",sumAbove)
23. print("\tOn the principal diagonal:",trace)
24. print("\tBelow the principal diagonal:",sumBelow)
Output:
Observation:
1. Line 8 accepts the number of rows of the matrix. Since the matrix is a square
matrix, the number of columns has to be the same as the number of rows.
2. Line 13 initializes the variables sumAbove, sumBelow and trace to 0. They
represent the sum of all elements above, below and on the principal diagonal
respectively.
3. Lines 15-16 are used to iterate through each element of the matrix.
4. Lines 17-19 utilize the fact that elements that lie on the principal diagonal
have the same value for row number (i) and column number (j). Elements
above the principal diagonal will have row number (i) less than column
number (j), and elements below the principal diagonal will exhibit the reverse
property.
1. #!/usr/bin/python
2.
3. # Matrix Addition and Subtraction
4.
5. m,n = map(lambda x: int(x), input("Enter the order of the
matrices: ").split(' '))
6.
7. print("Enter the elements of the first matrix rowwise:")
8. matrix1 = [list(map(lambda x: int(x),input().split(' ')))
for row in range(m)]
9. print("Enter the elements of the second matrix rowwise:")
10. matrix2 = [list(map(lambda x: int(x),input().split(' ')))
for row in range(m)]
11.
12. matrixSum = list(map(lambda rowMatrix1, rowMatrix2:
list(map(lambda a, b: a+b, rowMatrix1, rowMatrix2)), matrix1,
matrix2))
13.
14. matrixDiff = list(map(lambda rowMatrix1, rowMatrix2:
list(map(lambda a, b: a-b, rowMatrix1, rowMatrix2)), matrix1,
matrix2))
15.
16. print("Sum of the matrices:")
11. Practical Python 323
Output:
Observation:
1. Line 5 stores the order of the matrices, just like our earlier programs on
matrices. Both the input matrices will have the same order.
2. Lines 7-8 receive the elements of the first matrix and lines 9-10 similarly
receive the elements of the second matrix.
3. Line 12 adds the 2 matrices (matrix1 and matrix2) and stores the result in
matrixSum using list operations. We use map to map each element of both
the matrices using a lambda function. Remember that each element of the
matrices is actually a list representing a complete row in the matrix. The
corresponding rows are then passed on to another mapping, again using a
lambda function which adds the corresponding elements of the two given
rows and gives us a resultant row.
4. Line 14 performs a very similar operation to find the difference between the
matrices – the only difference being a subtraction instead of an addition.
5. The matrices are then displayed using lines 16-20 and 22-26.
324 11. Practical Python
matrix_mul.py
1. #!/usr/bin/python
2.
3. # Matrix Multiplication
4. from functools import reduce
5. import sys
6. m,n = map(lambda x: int(x), input("Enter the order of the
first matrix: ").split(' '))
7. p,q = map(lambda x: int(x), input("Enter the order of the
second matrix: ").split(' '))
8.
9. if not n==p:
10. print("Matrices are not multiplicable")
11. sys.exit()
12.
13. print("Enter the elements of the first matrix rowwise:")
14. matrix1 = [list(map(lambda x: int(x),input().split(' ')))
for row in range(m)]
15. print("Enter the elements of the second matrix rowwise:")
16. matrix2 = [list(map(lambda x: int(x),input().split(' ')))
for row in range(p)]
17.
18. matrixProduct = [ [ reduce(lambda x,y:x+y, [matrix1[i]
[k]*matrix2[k][j] for k in range(n)]) for j in range(q)] for i
in range(m)]
19.
20. print("Product of the matrices:")
21. for i in range(m):
22. for j in range(q):
23. print("{} ".format(matrixProduct[i][j]),end='')
24. print()
11. Practical Python 325
Output:
Observation:
1. Lines 6 and 7 accept the order of the matrices. Line 9 checks if the matrices
are multiplicable and terminates the program in line 11 if they aren't
multiplicable.
2. Lines 14 and 16 accept the elements of the 2 matrices, while line 18 performs
the actual matrix multiplication using nested list comprehensions.
The following checks can be made on infix arithmetic expressions to ensure they are
correct:
1. Operands are assumed to be single-character lowercase alphabets
2. Operators are assumed to be “+”, “-”, “*” and “/” only
3. All operators are assumed to be binary
4. Empty expressions are not allowed
5. Two operands cannot appear in succession “(ab” is incorrect)
6. Two operators cannot appear in succession (“a+*b” is incorrect), except
when one of them is a parentheses (“a+(b)” and “(a)+b” are correct).
However, “(a+)” and “(*c)” are still incorrect
7. The expression cannot start with an operator, except when the operator is
open parentheses (“*c” is incorrect)
8. The expression cannot end with an operator, except when the operator is
close parentheses (“c*” is incorrect)
9. The number of open parentheses should be the same as the number of close
parentheses (“(a” is incorrect)
10. When scanning the expression from left to right, a closing parentheses should
not appear before it’s corresponding open parentheses (“a)(” is incorrect
though the parentheses are balanced)
infixChecker.py
1. #!/usr/bin/python
2.
3. # Infix Expression Checker
4.
5. import sys
6.
7. def error(msg):
11. Practical Python 327
8. print(msg)
9. sys.exit()
10.
11. infix = input("Enter an infix expression:")
12. mode = prevMode = ''
13. parentheses=0
14.
15. for i in infix:
16. if i.isalpha():
17. mode='operand'
18. if prevMode == 'operand': error("Found two
operands in succession:")
19. elif i == '(':
20. if prevMode == 'operand': error("Found '(' after
operand")
21. parentheses = parentheses+1
22. elif i == ')':
23. if prevMode == 'operator': error("Found ')' after
operator")
24. parentheses = parentheses-1
25. if parentheses < 0: error("Improper nesting of
parentheses")
26. elif i in ('+','-','*','/'):
27. mode='operator'
28. if not prevMode == 'operand': error("Found
operator in wrong place")
29. else:
30. error("Invalid character found!")
31.
32. prevMode = mode
33.
34. if len(infix)==0: error("No expression")
35. if mode=='operator': error("Wrong termination of
expression")
36. if not parentheses==0: error("Improper parentheses")
37.
38. print("Correct!")
Observation:
1. We accept the infix expression from the user in line 11 and finally print
“Correct!” if the infix expression was valid in line 38.
2. We iterate through each character of the given infix expression in line 15, and
test if for various possibilities within the loop, taking relevant action.
3. We have a notion of current mode (mode) and previous mode (prevMode).
328 11. Practical Python
The current mode is determined based on the character that is seen in the
current iteration, and this becomes the previous mode in the next iteration
(line 31). The values for mode (and prevMode) will either be "operand" or
"operator", as assigned in lines 17 and 27. Both these variables start with
a null value in line 12.
4. Line 16 checks if the current character is alphabetic, implying that it is an
operand. We set mode to "operand" to reflect this observation in line 17. We
check for successive operands (which should generate an error) in line 18.
5. We have defined an error function (line 7-9) to handle display of suitable
messages and termination of the script.
6. Line 26 checks for the supported operators. We set mode to "operator" on
encountering this. We check for successive operators in line 28, generating
an error if necessary.
7. We check for parentheses in lines 19 and 22. As far as balancing the
parentheses are concerned, we can maintain a count of opened parentheses
encountered so far (variable parentheses, initialised to 0 in line 13). Each
time we encounter an opening parentheses, we increment this count (line 21)
and each time we encounter a closing parentheses, we decrement this count
(line 24). At no time should this count be negative (line 25) and finally the
count must be 0 (line 36).
8. We also check for the correct placement of these parentheses. An opening
parentheses cannot start after an operand (line 20) while a closing
parentheses cannot be placed after an operator (line 23).
9. No other characters are expected, and this case is handled in lines 29-30.
10. Line 34 checks for the special case where no expression is provided.
11. Line 35 verifies that the expression does not end with an operator.
In the postfix notation, a binary operator always succeeds it’s 2 operands, which
immediately precede the operator as shown in the syntax below:
11. Practical Python 329
Here are a few examples of infix arithmetic expressions and their postfix equivalent:
2 a+b ab+
3 a+b*c abc*+
4 a*b+c ab*c+
5 a*(b+c) abc+*
6 (a+b)*(c+d) ab+cd+*
Observation:
1. As long as no operators are involved, both infix and postfix expressions looks
the same, as illustrated in example 1. For lengthier expressions, do note that
the only difference is in the placement of operators, but the operands remain
in the same order within the expression!
2. Example 2 shows a basic arithmetic expression
3. In the postfix expression of example 3, the * operator works on b and c, while
the + operator works on a and the result of b*c.
4. In the postfix expression of example 4, the * operator works on a and b, while
the + operator works on the result of a*b and c.
5. In the postfix expression of example 5, the + operator works on b and c, while
the * operator works on a and the result of b+c.
6. In the postfix expression of example 6, the first + operator works on a and b,
the second + operator works on c and d, while the * operator works on the
results of the previous 2 operations (a+b and c+d).
7. We observe that unlike infix expressions where we need to evaluate operators
based on their precedence (and associativity), in postfix expressions, once
framed we can evaluate easily without considering anything else.
8. We also observe, for the same reason mentioned above, that we don’t need
or use parentheses in postfix expressions.
Advantages of the postfix notation:
330 11. Practical Python
1. We need not scan the expression for considering the operator precedence
and associativity. We can easily evaluate from left to right.
2. In infix expressions, parentheses are frequently used to alter operator
precedence. In postfix expressions, we don’t use parentheses at all since
there is no precedence to deal with once the expression has been carefully
framed.
3. For the above reasons, we find it easier to write programs to evaluate postfix
expressions than infix expressions.
4. The above reasons also justify the use of prefix over infix, but what makes
postfix slightly better than prefix is that we can directly use a stack to evaluate
postfix expressions (as will be seen later in this section).
infix2postfix.py
1. #!/usr/bin/python
2.
3. # Infix to Postfix Converter
4.
5. infix = input("Enter an infix expression:")
6. postfix = ''
7.
8. precedence = {'(':0,'+':1,'-':1,'*':2,'/':2}
9. operators = []
10.
11. for i in infix:
12. if i.isalpha(): postfix += i
13. elif i == '(': operators.append(i)
14. elif i == ')':
15. while not operators[-1] == '(':
16. postfix += operators.pop()
17. operators.pop()
18. else:
19. while len(operators)>0 and precedence[operators[-
1]]>=precedence[i]:
20. postfix += operators.pop()
21. operators.append(i)
22.
23. operators.reverse()
24. postfix += ''.join(operators)
25.
26. print(postfix)
Observation:
1. We accept the infix expression (infix) from the user in line 5 and initialise
the output string (postfix) in line 6.
2. We maintain a stack of operators (operators) in line 9.
3. The precedence of operators is recorded in the form of a dictionary
(precedence) in line 8. The key is the operator symbol and the value is the
numeric precedence, with higher values indicating higher precedence. Thus,
“*” and “/” have the same precedence, which is higher than the precedence
of “+” and “-”. The open parentheses has been added here with least
precedence since that also can exist in the stack and should not be popped
out due to any operator other than closing parentheses.
4. We iterate through the characters within the infix string in line 11.
5. Operands are transferred to the output string in line 12.
332 11. Practical Python
postfixEval.py
1. #!/usr/bin/python
2.
3. # Postfix Expression Evaluator
4.
5. postfix = input("Enter a postfix expression:")
6. stack = []
7. operands = {}
8.
9. for i in postfix:
10. if i.isalpha(): # Operand
11. if i not in operands:
12. operands[i] = int(input("Enter the value of
{}: ".format(i)))
13. stack.append(operands[i])
14. elif i=='+': stack.append(stack.pop() + stack.pop())
15. elif i=='*': stack.append(stack.pop() * stack.pop())
16. elif i=='-': stack.append(-(stack.pop() -
stack.pop()))
17. elif i=='/': stack.append(1/(stack.pop() /
stack.pop()))
18.
19. print("Value:",stack.pop())
Observation:
1. We accept the postfix expression (postfix) in line 5.
2. We create a stack (stack) to store intermediate expression values in line 6.
3. We store all variables (operands) with their corresponding values in a
dictionary operands in line 7.
4. We iterate through the postfix expression character by character from left to
right in line 9.
5. Line 10-13 deal with operands. If the operand is not present as a key in the
dictionary operands, we ask the user to enter the value of the same and
store the operand with it’s associated value in operands and finally push the
value of the operand on to the stack stack.
6. For each of the operators, we pop out 2 values from the stack stack, operate
on them and push the result back. This is done in lines 14-17.
7. Finally, in line 19, we pop the result from the stack stack and display it.
334 11. Practical Python
1. #!/usr/bin/python
2.
3. # Infix Expression Evaluator
4. import sys
5. def error(msg):
6. print(msg)
7. sys.exit()
8.
9. def infixChecker(infix):
10. mode = prevMode = ''
11. parentheses=0
12.
13. for i in infix:
14. if i.isalpha():
15. mode='operand'
16. if prevMode == 'operand': return "Found two
operands in succession:"
17. elif i == '(':
18. if prevMode == 'operand': return "Found '('
after operand"
19. parentheses = parentheses+1
20. elif i == ')':
21. if prevMode == 'operator': return "Found ')'
after operator"
22. parentheses = parentheses-1
23. if parentheses < 0: return "Improper nesting
of parentheses"
24. elif i in ('+','-','*','/'):
25. mode='operator'
26. if not prevMode == 'operand': return "Found
two operators in succession"
27. else:
28. return "Invalid character:"
29.
30. prevMode = mode
31.
32. if len(infix)==0: return "No expression"
33. if mode=='operator' and not i[-1] == ')': return
11. Practical Python 335
74.
75. infix = input("Enter an infix expression:")
76.
77. msg = infixChecker(infix)
78. if not msg is None: error(msg)
79. postfix = infix2postfix(infix)
80. value = postfixEval(postfix)
81. print("Value:",value)
Output:
Observation:
1. We accept an infix arithmetic expression from the user in line 75.
2. We check for the validity of the infix expression in lines 77-78 using the
function infixChecker defined in line 9.
3. If it is valid, we proceed to convert the infix arithmetic expression into it’s
postfix equivalent in line 79 using the function infix2postfix defined in
line 37.
4. After conversion, we proceed to evaluate the postfix expression in line 80
using the postfixEval function defined in line 59.
5. We finally print the value of the expression in line 81.
can however write a Python script to check whether a purported solution is correct or
not, and that would be a little too simple to do in Python!
Here is the Pythonic logic for the solution that can get the job done in very few lines of
code:
1. We accept the input from the user (9 rows of 9 digits each) and store it as a
list of strings – with the list containing a string for each row and each string
containing the 9 digits of that row.
2. We split the logic into 3 parts – row check (checking that each row contains
digits 1-9), column check (checking that each column contains digits 1-9) and
grid check (checking that each 3x3 grid contains digits 1-9).
3. We use a Boolean variable to indicate whether the solution is correct so far.
The moment we realise that any check fails, we set the Boolean variable to
False. We therefore start with the value of the Boolean variable being True.
4. For row check, we take the string pertaining to each row, split it into it’s
individual characters, sort them in ascending ASCII order, and join them back
again. If the row’s contents had all digits from 1 to 9, the string would now be
“123456789”!
5. For column check, we employ a similar strategy, but have to extract the
corresponding column element from each row to obtain the column contents.
6. For grid check, we similarly have to extract the contents across multiple rows
and multiple columns to obtain the contents of a single grid and employ the
same logic as above.
sudokuChecker.py
1. #!/usr/bin/python
2.
3. # Sudoku Solution Checker
4. row = []
5.
6. for i in range(9): row.append(input())
7.
8. ok=True
9.
10. # Row Check
11. for i in range(9):
12. if not ''.join(sorted(row[i])) == '123456789':
ok=False
13.
14. # Column Check
15. for j in range(9):
338 11. Practical Python
Output:
142576389
796384251
538912674
974235168
251698743
863741925
427153896
389467512
615829437
Correct!
Output:
142576389
427153896
796384251
538912674
974235168
251698743
863741925
389467512
615829437
Wrong!
11. Practical Python 339
Observation:
1. First of all, do note that in the first output, the given input is indeed a valid
solution. In the second output, we have inserted the contents of row 7 at row
2, thereby making it an invalid solution due to grid check.
2. As suggested in the logic above, we accept the input from the user and store
it in a list row in lines 4-6, use a Boolean flag ok initialised to True in line 8
and then proceed with row check (lines 10-12), column check (lines 14-16)
and grid check (lines 18-20).
3. Finally in lines 22-23, we print a suitable message depending the the value of
the Boolean variable ok.
4. Row check: In line 12, we take a single string, sort it (which ends up sorting
the characters within the string), join the characters together and check
whether it matches the string “123456789”.
5. Column check: In line 16, we use a list comprehension to create a list of all
those elements that lie in a particular column, sort the list using sorted, join
the characters together to form a string using join, and compare the string
against “123456789”.
6. Grid check: Each grid’s elements will be found in row[i][j], where
1. g, the grid number, ranges from 0 to 8
2. i ranges from g//3 * 3 to g//3 * 3 + 3 (considering only quotient
for division)
3. j ranges from g%3 * 3 to g%3 * 3 + 3 (considering only remainder)
7. In line 20, we use a list comprehension to create a list having all the elements
of grid g, sort them, join them to form a string and compare against
“123456789”.
11.11 Questions
1. Explain how stacks can be implemented in Python with examples to show the
various operations.
2. Explain how queues can be implemented in Python with examples to show
the various operations.
3. Write short notes on:
1. The map() function
2. The filter() function
3. The reduce() function
4. Explain how complex lists can be constructed easily using list
340 11. Practical Python
11.12 Exercises
1. Write a program to convert temperature from Celsius to Fahrenheit using map
function.
2. Write the above program using lambda expression.
3. Write a program to convert a tuple of angles into a list of tuples with each
tuple containing the sine and cosine of an angle.
4. Write a program to filter out the odd elements of the Fibonacci series for the
first n terms.
5. Write a program to find the highest number in a given list using reduce
function.
6. Write a program to find the sum of all the elements of a list using lambda
expression and reduce function.
7. Write a program to find the product of all the elements of a list using lambda
expression and reduce function.
8. Write a program to print the sum of all the numbers from 1 to 50 using lambda
expression and reduce function.
9. Write a program to create a list of numbers between 1 and 50 that are neither
divisible by 2 nor by 3 using filter.
10. Write a program to find all the palindromes from a list of strings.
11. Write a program to concatenate a list of strings to make a sentence using
reduce.
12. Write a program to generate the square values of element of a given list using
list comprehension.
13. Write a program to extract all vowels present in a given string using list
comprehension.
14. Write a program to flatten a matrix (convert rows of elements into a single list
of elements) using list comprehension.
15. Write a program to extract all the digits from a given string using list
comprehension.
16. Write a program to generate the transpose of a matrix using list
comprehension.
11. Practical Python 341
SUMMARY
12 OOP IN PYTHON
OOP IN PYTHON
12.1 Overview of OOP Principles
Object Oriented Programming (OOP) is a programming paradigm – a way of looking
at problems and designing solutions. The main goal behind OOP is to model the real
world within the software so that we have a parallel reality, making it easier for us to
relate code to the real world. This programming paradigm essentially uses the
following 8 principles to achieve this goal:
1. Classes
2. Objects
3. Data Encapsulation
4. Data Hiding
5. Data Abstraction
6. Polymorphism
7. Inheritance
8. Message Passing
We therefore will briefly cover these 8 principles before proceeding with how OOP can
be used in Python.
12.1.1 Class
A class can be defined as a design according to which objects can be later
instantiated.
The starting point of Object Oriented Modelling is the class. The class is the design of
an entity that exists in the real world. This design comprises of attributes (everything
an entity has) and behaviour (everything an entity can do).
As an example, here is a diagrammatic representation of a Date class (technically
using UML for the class representation) to represent a calendar date:
344 12. OOP in Python
Date
- day
- month
- year
+ setDate(d,m,y)
+ getDay()
+ getMonth()
+ getYear()
Note:
1. For now, it is sufficient to just note that Date is the name of a class, which has
3 attributes – day, month and year – and 4 member functions –
setDate(), getDay(), getMonth() and getYear().
2. Thus, the 3 pieces within the above UML diagram are: class name, data
members and member functions.
3. The “-” prefix denotes that we want to keep these members private (section
12.4.6.2)
4. The “+” prefix similarly denotes that we want to keep these members public
(again, section 12.4.6).
12.1.2 Object
An object is an instance of a class.
Once a class has been completely designed, multiple individual instances of the class
can be created just as how once a car has been designed, multiple cars can be
manufactured of the same design. These objects are identical to each other in terms
of their design and yet independent of each other in terms of the values of their
attributes. Thus, the different cars that are manufactured out of the same design can
have different values for their attributes like their colour. The behaviour of all objects of
the same class have to be identical, however. Thus, objects of a class typically have
the same attributes and behaviour, but can differ from each other in the values of their
attributes.
In our Date example, day, month and year are attributes whereas setDate(),
getDay(), getMonth() and getYear() are behaviour. Multiple Date objects can
have different values for day, month and year, thereby representing possibly
different dates, but will have identical functionality.
12. OOP in Python 345
12.1.6 Polymorphism
The term polymorphism in general means multiple forms of the same entity. In OOP, it
means performing the same logical operation by choosing from multiple
implementations appropriately. The two forms of polymorphism that we are going to
see actively in Python are:
1. Operator overloading – performing the same logical operation using an
operator in different ways (using different implementation) depending on the
type of the invoking operand
2. Dynamic polymorphism – performing the same logical operation in different
ways (using different implementation) depending on the type of the invoking
object.
In the Date example, an example of operator overloading would be providing an
implementation for addition of days to a given date, resulting in a new Date object.
12.1.7 Inheritance
Inheritance is the mechanism wherein a class acquires all the features and properties
of another class (or classes). Inheritance has the following uses:
1. Re-usability – it helps reduce effort by reusing existing code.
2. Extensibility – it helps add new code or apply changes without tampering
with existing code.
3. Compartmentalisation – it helps manage code better by compartmentalising
classes.
MAMMAL
12. OOP in Python 347
2. Multi-level inheritance –
when a class inherits from a
class that inherits from another ANIMAL
class, thereby forming an
inheritance chain.
MAMMAL
DOG
CARNIVOROUS
HERBIVOROUS
OMNIVOROUS
HERBIVOROUS CARNIVOROUS
OMNIVOROUS
348 12. OOP in Python
ANIMAL
MAMMAL REPTILE
class className:
statements
We will cover class variables in section 12.4.3 and class functions in section 12.4.4.
For now, let us start with an empty class. Since at least 1 statement is compulsory
within a class definition (similar to how conditions, loops and functions need to have at
least 1 statement within their body), we can use the pass statement to get away
without causing any effect:
class Date:
pass
Just as how a function definition can optionally start with a documentation string, even
class definitions can contain documentation strings, which are then accessible using
the special class variable __doc__:
We will add more code into the class definition as we learn more features in Python.
var=ClassName()
350 12. OOP in Python
>>> d=Date()
>>> d
<__main__.Date object at 0x7f592e29fb50>
>>> type(d)
<class '__main__.Date'>
>>> d.__doc__
'This is our first implementation of the Date class'
As can be seen, the __doc__ member of the class is accessible through the object d
too.
Python uses the Perl style, wherein the object can create whatever instance variables
it desires. In fact, it is possible for different objects of the same class to have different
instance variables, though doing so might not be a good idea! This creation of
instance variables is typically done using methods, as will be demonstrated in an
example soon.
Instance variables can also be deleted using the del statement!
In our Date example, we want the Date objects to have day, month and year as
instance variables – we want each Date object to have it's own value for these. These
instance variables can be created in the setDate() method as shown below:
Observation:
1. setDate() is a method – a function of the class Date that requires an
invoking object to invoke it upon itself. In our example, the statement
d.setDate(1,2,2000) shows how the Date object d invokes the function
setDate() upon itself, passing 1, 2 and 2000 as arguments to the method.
2. A reference to the invoking object is passed implicitly as the first argument to
the method setDate(), and is received as the parameter self. While the
parameter name should not matter strictly speaking, as a convention it is
always named self. It would be wise to follow this convention.
3. Everything that belongs to the invoking object should be explicitly preceded
by the reference self. Thus, to access the instance variable day of the
invoking object, we would need to access self.day. The first time an
assignment is made to an instance variable that does not exist, the instance
variable is created.
4. The setDate() method copies the given parameters (d, m and y) to the
corresponding instance variables (self.day, self.month and self.year
respectively). We will improve this method later by adding validation support.
5. The above example shows that the setDate() works as expected.
Observation:
1. We had earlier defined the method setDate(). Such methods that are used
to set the state of an object are called setters or setter methods.
2. We have now added the methods getDay(), getMonth() and getYear()
to obtain or extract the state of an object. Such methods are called getters or
getter methods. Since these methods have a single simple statement within
their bodies, they have been defined in a single line for brevity.
3. Observe that all methods (inclusive of and not limited to getters and setters)
use the first parameter (self) to reference the invoking object.
4. After the class definition is completed, we create an object d, set it's date
using our setter setDate() and extract the values back using our getters
(getDay, getMonth and getYear) and print them to verify the whole
process.
It would be a good idea to delegate the printing of the Date object to the object itself
via a method print():
Observation:
1. We have added a method print() to the Date class to print out the date.
This print() method has no connection to the global built-in print()
function. We can of course think of a different name than print() if required.
2. By delegating the responsibility of printing to the object, the usage of the
object becomes simpler!
3. We will see a better technique of achieving this objective of printing a string
representation of an object in section 12.8.2.2.
354 12. OOP in Python
True if leap and False otherwise. The condition for a year to be leap is that it
must be divisible by 4, and if divisible by 100 then must also be divisible by
400.
4. We will see a better way of implementing the isValid() and isLeap()
methods in section 12.4.4.
Before proceeding to the next section, let's add some more functionality to our Date
class and store it as a program. We will add a mechanism to add days to a Date
object and obtain the date after the addition:
Date1.py
1. #!/usr/bin/python
2.
3. # Implementation of Date class
4.
5. class Date:
6. def setDate(self,d,m,y):
7. if self.isValid(d,m,y):
8. self.day,self.month,self.year = d,m,y
9. else:
10. print("Invalid date!")
11.
12. def getDay(self): return self.day
13. def getMonth(self): return self.month
14. def getYear(self): return self.year
15.
16. def print(self):
17. print("{}-{}-{}".format
(self.getDay(),self.getMonth(),self.getYear()))
18.
19. def isValid(self,d,m,y):
20.
daysInMonth=[0,31,28,31,30,31,30,31,31,30,31,30,31]
21. if y<1 or y>9999: return False
22. if self.isLeap(y): daysInMonth[2]=29
23. if m<1 or m>12: return False
24. if d<1 or d>daysInMonth[m]: return False
25. return True
26.
27. def isLeap(self,y): return y%4==0 and (not y%100==0 or
y%400==0)
28.
29. def addDays(self,days):
30. d,m,y = self.day,self.month,self.year
31.
daysInMonth=[0,31,28,31,30,31,30,31,31,30,31,30,31]
356 12. OOP in Python
Output:
11-5-2000
Observation:
1. We have introduced an addDays() method in line 29.
2. The addDays() method uses a loop (line 34) to increment the day (d) by 1
(line 35) as many times as the number of days to be added.
3. Whenever the day (d) crosses the number of days in that month (tested in line
36), we reset the day to 1 and increment the month.
4. Whenever the month crosses 12 (tested in line 39), we reset the month to 1
and increment the year.
5. Whenever the year changes, we recompute the number of days in February
depending on whether the year is leap or not (lines 42-43).
6. For simplicity, we have not checked if the year crosses 9999.
7. Finally, the method returns a Date object from the values in d, m and y.
8. From the output, we can confirm that the date 100 days after 1-2-2000 is
indeed 11-5-2000.
12. OOP in Python 357
>>> class A:
... x=10
...
>>> A.x
10
We have created a class A with a class variable x initialized to 10. We can access this
using A.x, where A is the class object (class name for us) and x is the class variable.
Let us create objects now:
(continuation)
>>> a=A()
>>> b=A()
>>> a.x
10
>>> b.x
10
We have created 2 objects a and b, and can see that they both have access to the
class variable x, and provide us the same value 10. Let us attempt to make an
assignment to the class variable using an instance:
(continuation)
>>> a.x=20
>>> A.x
10
>>> a.x
20
>>> b.x
10
(continuation)
>>> A.x=30
>>> A.x
30
>>> a.x
20
>>> b.x
30
We can see that when we change the class variable using the class object, the
change is visible using the class object as well as all instances of that class, except
those instances that also have an instance variable with the same name (in our case,
the instance variable x in the instance a).
Let us apply this concept to our Date class in Date1.py. We see that the methods
isValid and addDays require the same array daysInMonth. They are specified
twice – in lines 20 and 31. Let us make this variable a class variable, accessible by
any object – and therefore any method within the object. We could of course have
made this an instance variable, but the fact that the values of this array does not
change from instance to instance justifies making this a class variable instead.
Date2.py
1. #!/usr/bin/python
2.
3. # Implementation of Date class
4.
5. class Date:
6. daysInMonth=[0,31,28,31,30,31,30,31,31,30,31,30,31]
7.
8. def setDate(self,d,m,y):
9. if self.isValid(d,m,y):
10. self.day,self.month,self.year = d,m,y
11. else:
12. print("Invalid date!")
13.
14. def getDay(self): return self.day
15. def getMonth(self): return self.month
16. def getYear(self): return self.year
17.
18. def print(self):
19. print("{}-{}-
{}".format(self.getDay(),self.getMonth(),self.getYear()))
20.
21. def isValid(self,d,m,y):
22. Date.daysInMonth[2]=28
23. if y<1 or y>9999: return False
12. OOP in Python 359
>>> class A:
... x=10
... def increment():
... A.x=A.x+1
...
>>> A.x
10
>>> A.increment()
>>> A.x
11
(continuation)
>>> a=A()
>>> b=A()
>>> a.x
11
>>> b.x
11
We then create 2 objects of the class A (a and b) and observe that they too give us
the incremented value of the class variable x.
We cannot however invoke the class function increment() using any of the objects
a and b as class functions are not methods:
(continuation)
>>> a.increment()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: increment() takes 0 positional arguments but 1 was given
12. OOP in Python 361
Class functions are preferred over methods when it is obvious that the function does
not depend on any particular instance of that class. In our Date example, we observe
that the functions isValid() and isLeap() are in no way connected to any specific
instance of the Date class, and are thus candidates for conversion into class
functions.
For beginners, a simple technique to determine when to convert instance methods to
class functions is when there is no usage of self within the function definition, except
for accessing other methods which are also candidates for conversion to class
functions. In our Date example, the method isValid() does use self to invoke the
method isLeap(). The isLeap() method does not use self in it's definition and
hence can be converted to a class function. After this change, the call to isLeap()
within isValid() changes from self.isLeap() to Date.isLeap() and there is
no usage of self within isValid(), making it possible to convert isValid() also
from an instance method to a class function.
In case you are wondering why we should convert these instance methods into class
functions when the code is working perfectly fine, it is because it is artificial and could
also be misleading or meaningless. In the statement self.isValid(d,m,y), self
has absolutely no role to play! Similarly, in the statement d1.isLeap(2000), d1 has
no role to play and could be misleading at worst and meaningless at best. It is far
better to replace them with Date.isValid(d,m,y) and Date.isLeap(2000)
respectively. Also note that class functions can be invoked even when the class has
not been instantiated while instance methods cannot be invoked without an instance.
NOTE:
Python 2.2 and above supports the decorators @classmethod and
@staticmethod, both of which are outside the purview of this book!
1. #!/usr/bin/python
2.
3. # Implementation of Date class
4.
5. class Date:
6. daysInMonth=[0,31,28,31,30,31,30,31,31,30,31,30,31]
7.
8. def setDate(self,d,m,y):
9. if Date.isValid(d,m,y):
10. self.day,self.month,self.year = d,m,y
11. else:
12. print("Invalid date!")
13.
14. def getDay(self): return self.day
362 12. OOP in Python
Output:
11-5-2000
Observation:
1. This program is based on Date2.py. We have changed the instance
methods isValid() and isLeap() to class functions by eliminating self.
2. A result of this change is that all calls to isValid() and isLeap() will now
require Date instead of a Date instance.
>>> class A:
... def f(self):
... print("Hello")
...
>>> a=A()
>>> a.f()
Hello
>>> A.f(a)
Hello
Observation:
1. We have defined a class A with a method f that prints “Hello” when invoked.
We have an instance of this class whose reference is stored in a.
2. a.f() is a call to the method f of class A using a as the invoking object.
3. A.f(a) is a call to the class function f of the class A, passing a as an
argument that is received as self in f.
Thus, methods are special class functions!
364 12. OOP in Python
_member
Since we have not dealt with inheritance yet, protected members will be demonstrated
in section 12.6.2.
__member
>>> class A:
... def set(self,x,y):
... self.x = x
... self.__y = y
... def print(self):
... print("{},{}".format(self.x,self.__y))
...
>>> a=A()
>>> a.set(2,3)
>>> a.print()
2,3
>>> a.x
2
>>> a.y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute 'y'
>>> a.__y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute '__y'
>>> a._A__y
3
Observation:
1. We have defined a class A with a setter called set that assigns the
parameters x and y to the members x and y (called __y) respectively. The
member x is considered to be public whereas the member __y is considered
to be private.
2. The print() method displays the values of members x and __y, both of
which are accessible because print() is a method of the same class.
3. From outside the class, only x is accessible and __y is not.
4. The private member __y is accessible outside the class with the name
_A__y.
366 12. OOP in Python
In our Date example, we can convert all class variables and instance variables to
private! In general, it is always better to convert all class variables and instance
variables to private to prevent accidental changes from outsiders and thus implement
data hiding. Therefore, the instance variables day, month and year need to be
renamed as __day, __month and __year respectively. Also the class variable
daysInMonth will be replaced by __daysInMonth.
Date4.py
1. #!/usr/bin/python
2.
3. # Implementation of Date class
4.
5. class Date:
6. __daysInMonth=[0,31,28,31,30,31,30,31,31,30,31,30,31]
7.
8. def setDate(self,d,m,y):
9. if Date.isValid(d,m,y):
10. self.__day,self.__month,self.__year = d,m,y
11. else:
12. print("Invalid date!")
13.
14. def getDay(self): return self.__day
15. def getMonth(self): return self.__month
16. def getYear(self): return self.__year
17.
18. def print(self):
19. print("{}-{}-
{}".format(self.getDay(),self.getMonth(),self.getYear()))
20.
21. def isValid(d,m,y):
22. Date.__daysInMonth[2]=28
23. if y<1 or y>9999: return False
24. if Date.isLeap(y): Date.__daysInMonth[2]=29
25. if m<1 or m>12: return False
26. if d<1 or d>Date.__daysInMonth[m]: return False
27. return True
28.
29. def isLeap(y): return y%4==0 and (not y%100==0 or y
%400==0)
30.
31. def addDays(self,days):
32. d,m,y = self.__day,self.__month,self.__year
33. Date.__daysInMonth[2]=28
34. if Date.isLeap(y): Date.__daysInMonth[2]=29
35.
36. for i in range(days):
12. OOP in Python 367
37. d=d+1
38. if d>Date.__daysInMonth[m]:
39. d=1
40. m=m+1
41. if m>12:
42. m=1
43. y=y+1
44. if Date.isLeap(y):
Date.__daysInMonth[2]=29
45. else: Date.__daysInMonth[2]=28
46. result = Date()
47. result.setDate(d,m,y)
48. return result
49.
50. d1 = Date()
51. d1.setDate(1,2,2000)
52. d2 = d1.addDays(100)
53. d2.print()
Output:
11-5-2000
Observation:
1. This program is based on Date3.py. The instance variables (day, month
and year) and class variables (daysInMonth) have been made private by
prefixing them with __.
2. The instance and class variables can no longer be directly accessed outside
the class.
3. Even class functions and instance methods can be made private by prefixing
them with __. Such functions/methods can then be used internally by the
class and is not directly callable from outside the class. In our Date example,
we can consider making the class functions isValid and isLeap private if
we feel that the outside world will have no interest in these.
4. Making class variables and instance variables private helps implement data
hiding. Making class functions and instance methods private helps implement
data abstraction.
368 12. OOP in Python
12.5.1 Constructors
A constructor in Python is an instance method that is automatically invoked when an
instance is created and permits the programmer to perform any initialization required
to ensure that the instance is in a valid state.
A constructor is identified in Python by it's special name: __init__. Note that the
leading underscores do not make this instance method private as the method name
does not end with at most 1 underscore – it in fact ends with 2 underscores!
The following code snippet demonstrates how constructors can be designed and how
they are automatically invoked when objects are instantiated:
>>> class A:
... def __init__(self):
... print("Constructor called!")
...
>>> a=A()
Constructor called!
>>> b=A()
Constructor called!
Observation:
1. The constructor is always called __init__() and being an instance method
it will always receive self as the first parameter.
2. The constructor can be designed to receive any number of parameters in
addition to self, but since Python does not support function overloading,
only 1 constructor can exist in a class.
3. It is a good programming practice to use the constructor to create all instance
variables that an object requires – all initialized to meaningful values.
4. In classes without explicitly defined constructors, one can imagine Python
adding a dummy constructor that does nothing, i.e. an empty constructor.
12. OOP in Python 369
In our Date example, we can add a constructor that permits the construction of a
Date object by specifying the day, month and year. We need not worry about the
implementation as the setDate() method can be used for this purpose. Since the
objective of the constructor is to ensure that the object is in a valid state, we will
ensure this even if the given values of day, month and year is not valid.
Date5.py
1. #!/usr/bin/python
2.
3. # Implementation of Date class
4.
5. class Date:
6. __daysInMonth=[0,31,28,31,30,31,30,31,31,30,31,30,31]
7.
8. def __init__(self,d,m,y):
9. self.setDate(1,1,1970)
10. self.setDate(d,m,y)
11.
12. def setDate(self,d,m,y):
13. if Date.isValid(d,m,y):
14. self.__day,self.__month,self.__year = d,m,y
15. else:
16. print("Invalid date!")
17.
18. def getDay(self): return self.__day
19. def getMonth(self): return self.__month
20. def getYear(self): return self.__year
21.
22. def print(self):
23. print("{}-{}-
{}".format(self.getDay(),self.getMonth(),self.getYear()))
24.
25. def isValid(d,m,y):
26. Date.__daysInMonth[2]=28
27. if y<1 or y>9999: return False
28. if Date.isLeap(y): Date.__daysInMonth[2]=29
29. if m<1 or m>12: return False
30. if d<1 or d>Date.__daysInMonth[m]: return False
31. return True
32.
33. def isLeap(y): return y%4==0 and (not y%100==0 or y
%400==0)
34.
35. def addDays(self,days):
36. d,m,y = self.__day,self.__month,self.__year
37. Date.__daysInMonth[2]=28
370 12. OOP in Python
Output:
1-2-2000
Invalid date!
1-1-1970
Observation:
1. This program is based on Date4.py. We have introduced a constructor in
lines 8-10.
2. The constructor receives the day, month and year and uses the setDate()
method to perform the necessary validation and assignment. It is a good
practice to reuse functions as it reduces our work, makes the code more
robust and also makes it easy to change the program later on if required.
3. Since our current implementation of setDate() merely prints an error
message and continues if the given values of day, month and year do not
represent a valid date, our constructor first sets a valid date (1-1-1970 is
chosen primarily because it is a valid date, and more so because it is an
important reference date in computers) and then only proceeds to call
setDate() to change the date if the given date is valid. A better
implementation will use exception handling, covered in section 13.
12. OOP in Python 371
12.5.2 Destructors
A destructor in Python is an instance method that is automatically invoked when an
object is going to be destroyed and eliminated and permits the programmer to perform
any desired clean-up.
A destructor is identified in Python by it's special name: __del__.
The following code snippet demonstrates the working of destructors in Python:
>>> class A:
... def __del__(self):
... print("Destructor called!")
...
>>> a=A()
>>> del a
Destructor called!
>>> b=A()
>>> del b
Destructor called!
Observation:
1. The destructor is an instance method does not receive any parameters apart
from self.
2. Like constructors, one class can have only one destructor, and the absence of
a user-defined destructor results in a dummy destructor supplied by Python
that is empty.
3. While in this example the destructor merely prints a message, destructors can
do many meaningful operations when an object is going to be eliminated. If no
such requirement exists, then a destructor need not be defined in that class.
4. The del built-in function is used to delete a variable, and can also end up
deleting objects in memory.
>>> class A:
... def __del__(self):
... print("Destructor called!")
...
>>> a=A()
>>> b=a
>>> del a
>>> del b
Destructor called!
Observation:
1. We create an object of class A and store it's reference in the variable a. The
reference count of this object is 1 (only 1 variable – a – is referring to the
object)
2. This reference is copied from a to b, ending up with 2 references to the
object. Therefore, the reference count of the object becomes 2.
3. When we delete the variable a, we also decrement the reference count of the
object referred to by a by 1. Therefore, the reference count of the object
decrements to 1. Since it has not reached 0, the object continues to exist
unaffected by the deletion of the variable a.
4. When we delete the variable b, we also decrement the reference count of the
object referred to by b by 1. The reference count now reaches 0. This is when
the object will be destroyed, and just prior to that the destructor gets
automatically called.
Since our Date class does not really require a destructor, we will not attempt to
demonstrate the addition of the same.
12.6 Inheritance
Inheritance is one of the most important concepts in OOP and has these advantages:
1. It helps compartmentalise the code, thereby helping the programmer
organise the code better.
2. It allows reusability of code by allowing a new class to completely obtain the
functionality of other existing classes and add more functionality of it's own.
3. It permits extensibility of code, wherein new code is added without having to
modify existing code and classes.
12. OOP in Python 373
While the different structural forms of inheritance have been covered in section 12.1.7,
we will basically examine 2 forms that make all other forms possible:
1. Simple inheritance, wherein one class inherits from another
2. Multiple inheritance, wherein one class inherits from multiple classes
One point to be kept in mind as far as inheritance is concerned is that the derived
class contains all features of the base class. We can imagine that each instance of the
derived class also contains an instance of the base class. The private features of the
base class are also present in the derived class, but are not directly accessible.
class derived_class(base_class):
class_definition
Here is a small code snippet to demonstrate simple inheritance with class B deriving
from class A:
>>> class A:
... pass
...
>>> class B(A):
... pass
...
>>> a=A()
>>> b=B()
>>> type(a)
<class '__main__.A'>
>>> type(b)
<class '__main__.B'>
Observation:
1. The class definitions are empty, but as Python requires at least 1 line in the
definition, we have used the pass statement.
2. The class B derives from class A: class B(A)
374 12. OOP in Python
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Public, Private and Protected Method access
5.
6. class A:
7. def __f1(self): print("A.f1")
8. def f2(self): print("A.f2")
9. def _f3(self): print("A.f3")
10.
11. class B(A):
12. def __g1(self): print("B.g1")
13. def g2(self): print("B.g2")
14. def _g3(self): print("B.g3")
15.
16. b = B()
Observation:
1. The above program does not produce any output. It has been given only for
us to analyse the program and draw conclusions. Therefore, there is no point
in running this program.
2. Class A is the base class and class B is the derived class. Class A contains 3
methods - __f1, f2 and _f3 – which are private, public and protected
respectively. Class B also contains 3 methods - __g1, g2 and _g3 – which
are private, public and protected respectively.
3. Recollect that public members are accessible both inside as well as outside
the class, private members are accessible only inside the class and protected
members are accessible only inside the class and inside it's subclasses. Also
recollect that the interpreter may not prohibit direct access of protected
members from outside the class.
4. Line 16 creates an instance of the derived class B called b. We will now
analyse which methods can be invoked using this instance b.
5. b.g2() is the most obvious valid candidate – any public method of a class is
accessible from even outside the class.
12. OOP in Python 375
6. b.f2() is the next obvious valid candidate – public methods of the base
class and inherited as public methods of the derived class and public methods
of the derived class are accessible even outside the class.
7. b.__g1() is an invalid candidate – private methods of a class are not
accessible outside the class. We of course can invoke b.f2() which in turn
can call self.__f1().
8. b._g3() is supposed to be an invalid candidate – protected members of a
class are not accessible outside the class, but the interpreter may not
specifically prohibit it's access. We will only attempt to call _g3() through
some other route (like through g2() for instance).
9. b.__f1() is an invalid candidate – private members of a class are
accessible only within that class and are inaccessible even in it's subclasses.
We can follow other routes though, like calling __f1() from f2() or from
g2().
10. b._f3() is supposed to be an invalid candidate – protected members of a
class are accessible within subclasses, but not outside of these subclasses. B
being a subclass of A does have access to _f3(), but we are not supposed
to access them from outside B.
Inheritance2.py
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Function Overriding
5.
6. class A:
7. def f(self):
8. print("A.f")
9.
10. class B(A):
11. def f(self):
12. print("B.f")
13.
14. b = B()
15. b.f()
Output:
B.f
What if we want B's f to also invoke A's f as part of it's functionality? We can use the
class function syntax to invoke the method (as illustrated in section 12.4.5), passing
self as the reference to the instance. Alternatively, we can use the super() built-in
to access the super class object of the current object (identified by self):
Inheritance3.py
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Function Overriding
5.
6. class A:
7. def f(self):
8. print("A.f")
9.
10. class B(A):
11. def f(self):
12. print("B.f")
13. super().f()
14.
15. b=B()
16. b.f()
12. OOP in Python 377
Output:
B.f
A.f
Observation:
1. The statement A.f(self) is very similar to the statement self.f(), with
self referring to an instance of A. But in our case, self is referring to an
instance of B and self.f() will result in a call to B.f(self) instead!
2. The statement super().f() is equivalent to the statement self.f() with
self referring to an instance of A that is housed within an instance of B that is
currently referred to by self!
What if we want to directly invoke A's f without involving B's f using an instance of B?
Again, we can use the class function syntax as shown below:
Inheritance4.py
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Function Overriding
5.
6. class A:
7. def f(self):
8. print("A.f")
9.
10. class B(A):
11. def f(self):
12. print("B.f")
13. super.f()
14.
15. b=B()
16. A.f(b)
Output:
A.f
378 12. OOP in Python
Observation:
1. This time, we do not use b.f(), which will always give preference to B's f.
We directly invoke A.f passing b as the argument.
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Constructors and Destructors
5.
6. class A:
7. def __init__(self):
8. print("A constructed")
9. def __del__(self):
10. print("A destroyed")
11.
12. class B(A):
13. def __init__(self):
14. print("B constructed")
15. def __del__(self):
16. print("B destroyed")
12. OOP in Python 379
17.
18. b=B()
19. del(b)
Output:
B constructed
B destroyed
Observation:
1. We observe that when we instantiate class B, class B's constructor is invoked,
but class A's constructor is not automatically invoked.
2. Similarly, when we destroy the instance of class B, the destructor of class B is
invoked, but the destructor of class A is not automatically invoked.
Ideally, we would want the constructors of both classes to be invoked during
construction and the destructors to be similarly automatically invoked upon
destruction. Since this is not automatically performed by Python, we need to change
the snippet as follows to obtain the desired result:
Inheritance6.py
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Constructors and Destructors
5.
6. class A:
7. def __init__(self):
8. print("A constructed")
9. def __del__(self):
10. print("A destroyed")
11.
12. class B(A):
13. def __init__(self):
14. super().__init__()
15. print("B constructed")
16. def __del__(self):
17. print("B destroyed")
18. super().__del__()
19.
20. b=B()
21. del(b)
380 12. OOP in Python
Output:
A constructed
B constructed
B destroyed
A destroyed
Observation:
1. The derived class constructors calls the base class constructor before
executing any code within it. This is ideally how derived class constructors
should be!
2. The derived class destructor calls the base class destructor after executing
any code within it. This is ideally how derived class destructors should be!
3. While it may not be considered wrong in Python if such calls are not made, it
is generally required. If you are unsure of whether or not these calls are
required in a particular situation, you might want to add the calls anyway. In
fact, you could make it a habit of adding these calls whenever you write
classes involving inheritance and remove it if and only when you have a
strong reason to do so.
4. When the base class constructor requires arguments, the arguments could be
received by the derived class constructor and passed on to the base class
constructor in the call. In such cases, the derived class constructor is free to
accept more arguments than are required by the base class constructor, but
the additional arguments would be meant to be used by the derived class
constructor and should not be passed to the base class constructor.
Let us reimplement the Date example of section 12.5.1 using inheritance. Recollect
that we were printing the Date using the print() method in UK format. What if we
wanted US format? Would it be a good idea to modify the code to print in US format?
No! That will cease the current functionality which might be required in other places
and forces us to retest the entire code again with the changes in place. Would it be a
good idea to write another Date class with a different name like USDate, copying the
code from the Date class and then making the changes? No! Tomorrow if we end up
changing the code of the Date class, we might have to make the same changes in
the USDate class, making maintenance difficult, changes complicated and introducing
a possibility of adding bugs. The best technique would be to derive USDate from the
Date class, thereby automatically having all functionality of the Date class, and
overriding the print() method providing a different implementation of the same.
This is illustrated in the program below.
12. OOP in Python 381
Date6.py
1. #!/usr/bin/python
2.
3. # Implementation of Date class and USDate class using
inheritance
4.
5. class Date:
6. __daysInMonth=[0,31,28,31,30,31,30,31,31,30,31,30,31]
7.
8. def __init__(self,d,m,y):
9. self.setDate(1,1,1970)
10. self.setDate(d,m,y)
11.
12. def setDate(self,d,m,y):
13. if Date.isValid(d,m,y):
14. self.__day,self.__month,self.__year = d,m,y
15. else:
16. print("Invalid date!")
17.
18. def getDay(self): return self.__day
19. def getMonth(self): return self.__month
20. def getYear(self): return self.__year
21.
22. def print(self):
23. print("{}-{}-
{}".format(self.getDay(),self.getMonth(),self.getYear()))
24.
25. def isValid(d,m,y):
26. Date.__daysInMonth[2]=28
27. if y<1 or y>9999: return False
28. if Date.isLeap(y): Date.__daysInMonth[2]=29
29. if m<1 or m>12: return False
30. if d<1 or d>Date.__daysInMonth[m]: return False
31. return True
32.
33. def isLeap(y): return y%4==0 and (not y%100==0 or y
%400==0)
34.
35. class USDate(Date):
36. def __init__(self,d,m,y):
37. super().__init__(d,m,y);
38.
39. def print(self):
40. print("{}/{}/
{}".format(self.getMonth(),self.getDay(),self.getYear()))
41.
382 12. OOP in Python
42. d1 = Date(1,2,2000)
43. d1.print()
44. d2 = USDate(1,2,2000)
45. d2.print()
Output:
1-2-2000
2/1/2000
Observation:
1. This program is based on Date5.py covered in section 12.5.1, but we have
removed the addDays() method for simplicity. Lines 1-34 are from that
program.
2. We have added a class called USDate in line 35, which derives from the
Date class.
3. Line 36 provides a constructor for the derived class, which simply passes on
the parameters to the constructor of the base class. Our derived class
constructor has no other job to do. No destructor was present earlier and is
still not required here.
4. Line 39 provides a new definition for the print() method, which displays the
date in US format.
class className(baseClass1,baseClass2[,baseClass3...])
The above syntax shows at least 2 compulsory base classes as without that this
would be called multiple inheritance in the first place! There is no limit to how many
classes a class can extend from. The order of inheritance is an important property
here and goes from left to right in the base class list. Thus, the first base class is
baseClass1, the second base class is baseClass2 and so on. The order of
inheritance matters when we use built-ins like super(), and professionals always
aim to ensure that constructor calls are always in the order of inheritance whereas
destructor calls are always in the strict reverse order. These are demonstrated in the
following sections.
12. OOP in Python 383
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Multiple Inheritance
5.
6. class A:
7. def fa(self):
8. print("A called")
9.
10. class B:
11. def fb(self):
12. print("B called")
13.
14. class C(A,B):
15. def fc(self):
16. self.fa()
17. self.fb()
18. print("C called")
19.
20. c=C()
21. c.fc()
Output:
A called
B called
C called
Observation:
1. Class A is defined in line 6 and contains a method fa defined in line 7.
2. Class B is defined in line 10 and contains a method fb defined in line 11.
3. Class C is defined in line 14 and derives from class A and class B. The order
of inheritance is A followed by B, though in this example we do not observe
any result because of this order.
4. A method fc is defined in line 14 inside class C, which invokes methods fa
and fb upon itself, available to it because of inheritance.
384 12. OOP in Python
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Multiple Inheritance - Function Overriding
5.
6. class A:
7. def f(self):
8. print("A")
9.
10. class B:
11. def f(self):
12. print("B")
13.
14. class C(A,B):
15. def f(self):
16. print("C")
17.
18. c=C()
19. c.f()
Output:
Observation:
1. Class C, defined in line 14, derives from both class A and class B.
2. All the classes (A, B and C) define their own respective copies of the method f
(lines 7, 11 and 15).
3. When a call is made in line 19 using the statement c.f(), preference is
given to the method f in class C.
12. OOP in Python 385
The reason why we are revisiting function overriding with respect to multiple
inheritance is that it gets interesting now when we use the super() function because
the question arises: when there are 2 (or more) base classes, which base class is
considered to be the super class? To answer this question is obtained by the order of
inheritance, which was covered in section 12.6.5. To recall, the order of inheritance is
from left to right, and in this example, the first base class of C is therefore A. The first
preference is given to the method f in class A. Only if A (and it's super classes, in the
order of inheritance) does not define method f will preference be given to class B (and
it's super classes in the order of inheritance). This is demonstrated by the following
program:
Inheritance9.py
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Multiple Inheritance - Function Overriding
5.
6. class A:
7. def f(self):
8. print("A")
9.
10. class B:
11. def f(self):
12. print("B")
13.
14. class C(A,B):
15. def f(self):
16. super().f()
17. print("C")
18.
19. c=C()
20. c.f()
Output:
A
C
What if we want to call the method f of class B from within class C? We use the class
function syntax: B.f(self).
386 12. OOP in Python
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Multiple Inheritance - Constructors and Destructors
5.
6. class A:
7. def __init__(self):
8. print("A constructed")
9. def __del__(self):
10. print("A destroyed")
11.
12. class B:
13. def __init__(self):
14. print("B constructed")
15. def __del__(self):
16. print("B destroyed")
17.
18. class C(A,B):
19. def __init__(self):
20. A.__init__(self)
21. B.__init__(self)
22. print("C constructed")
23. def __del__(self):
24. print("C destroyed")
25. B.__del__(self)
26. A.__del__(self)
27.
28. c=C()
29. del(c)
Output:
A constructed
B constructed
C constructed
C destroyed
B destroyed
A destroyed
12. OOP in Python 387
Observation:
1. The constructor of class C first passes control to the constructor of class A,
then the constructor of class B and then continues it's execution.
2. The destructor of class C first executes itself and finally calls the destructor of
class B followed by the destructor of class A.
3. As pointed out in section 12.6.4, this is done to honour the rule that base
classes can exist without derived classes but not the other way around.
4. The order of inheritance is respected: class A is the first base class of class C
and hence is created first but destroyed last.
1. #!/usr/bin/python
2.
3. # Inheritance Demo:
4. # Dynamic Polymorphism
5.
6. class Animal:
7. def __init__(self,name):
8. self.name = name
9.
10. def speak(self):
11. pass
12.
13. class Dog(Animal):
14. def __init__(self):
15. super().__init__("Dog")
16.
17. def speak(self):
18. print("Bow wow!")
19.
388 12. OOP in Python
Output:
Observation:
1. Class Animal is defined in line 6. It contains a constructor and a method
called speak.
2. The constructor (defined in line 7) receives the name of the animal and stores
it in the attribute name.
3. The speak method (defined in line 10) does nothing and will be overridden by
the derived classes suitably.
4. The Dog class (defined in line 13) derives from the Animal class. It's
constructor receives nothing, but invokes the constructor of Animal passing
“Dog” as the name of the animal. It's speak method ends up printing “Bow
wow!”
5. The Cat class (defined in line 20) derives from the Animal class. It's
constructor receives nothing, but invokes the constructor of Animal passing
“Cat” as the name of the animal. It's speak method ends up printing “Meow!”
12. OOP in Python 389
6. The introduce function (defined in line 28) accepts an animal, prints it's
name and invokes it's speak method to make the animal “speak”. While the
function is designed to receive an instance of the type Animal, it can also
receive an instance of any derived class of Animal because of the law of
substitutability introduced at the beginning of this section. If a Dog instance is
passed, preference is given to the speak method of Dog and if a Cat
instance is passed, preference is given to the speak method of Cat. If Dog or
Cat class does not override the speak method of Animal, then the speak
method of Animal gets invoked which does nothing.
hasattr(object,attribute)
This is a Boolean function that returns true only if the given instance contains the
given attribute, as shown in the code snippet below:
>>> class A:
... def __init__(self):
... self.x=0
...
>>> a=A()
>>> hasattr(a,'x')
True
>>> hasattr(a,'y')
False
Observation:
1. We define a class A that contains a constructor which initializes an attribute x
to 0 (thereby creating it). Since the constructor is invoked each time an
instance is created, it is guaranteed that all instances of A will have the
attribute x in them.
2. hasattr(a,'x') therefore returns True whereas hasattr(a,'y')
returns False since no attribute y was created in the instance a.
getattr(object,attribute[,default])
12. OOP in Python 391
Note:
1. If the attribute attribute exists in the instance object, it's value is
returned.
2. If the attribute attribute does not exist in the instance object, default
is returned.
3. If the attribute attribute does not exist in the instance object and no
default is provided, an AttributeError occurs.
Example:
>>> class A:
... def __init__(self):
... self.x=0
...
>>> a=A()
>>> getattr(a,'x',2)
0
>>> getattr(a,'y',2)
2
>>> getattr(a,'y')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute 'y'
Observation:
1. getattr(a,'x',2) returns the value of the attribute x in the instance a,
which is 0.
2. getattr(a,'y',2) returns the value 2 (the default) as there is no
attribute y in the instance a.
3. getattr(a,'y') generates an AttributeError as there is no attribute y
in the instance a and no default was provided either.
392 12. OOP in Python
setattr(object,attribute,value)
Example:
>>> class A:
... def __init__(self):
... self.x=0
...
>>> a=A()
>>> setattr(a,'x',2)
>>> setattr(a,'y',3)
>>> a.x
2
>>> a.y
3
Observation:
1. An instance a is created with an attribute x having the value 0.
2. setattr(a,'x',2) overwrites the value of the attribute x in the instance a
to 2.
3. setattr(a,'y',3) creates an attribute y in the instance a and assigns a
value 3 to it.
delattr(object,attribute)
12. OOP in Python 393
Note:
1. If the attribute attribute exists in the instance object, it is removed from
the instance.
2. If the attribute attribute does not exist in the instance object, an
AttributeError is generated.
Example:
>>> class A:
... def __init__(self):
... self.x=0
...
>>> a=A()
>>> a.x
0
>>> delattr(a,'x')
>>> a.x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute 'x'
>>> delattr(a,'y')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: y
Attribute Details
__name__ The name of the class
__bases__ A tuple containing the base classes of this class, in the order of
inheritance (see section 12.6.5)
__module__ The name of the module to which this class belongs (see
section 15)
1. #!/usr/bin/python
2.
3. # Standard Attributes Demo:
4.
5. class A:
6. pass
7.
8. class B:
9. pass
10.
11. class C(A,B):
12. "A class to demonstrate standard attributes"
13.
14. x=10
15.
16. def __init__(self):
17. pass
18.
19. def f(self):
20. pass
21.
22. print("__name__:",C.__name__)
23. print("__doc__:",C.__doc__)
24. print("__bases__:",C.__bases__)
25. print("__module__:",C.__module__)
26. print("__dict__:",C.__dict__)
Output:
__name__: C
__doc__: A class to demonstrate standard attributes
__bases__: (<class '__main__.A'>, <class '__main__.B'>)
__module__: __main__
__dict__: {'f': <function C.f at 0x7f92c18564d0>, '__module__':
'__main__', '__doc__': 'A class to demonstrate standard
attributes', '__init__': <function C.__init__ at
0x7f92c1856440>, 'x': 10}
12. OOP in Python 395
Observation:
1. We have defined class A in line 5 and class B in line 8. Class C (defined in line
11) derives from class A and class B. Class C contains a constructor and a
method f. It also contains a class variable x.
2. We observe that the __name__ attribute correctly gives us the class name as
C.
3. We observe that the __doc__ attribute picks up the documentation string
from the class C.
4. We observe that the __bases__ attribute has identified class A and class B,
both in the module __main__, to be the base classes.
5. We observe that the module within which class C is present is __main__.
6. Finally, the attribute __dict__ is a dictionary containing standard attributes
as well as user-defined ones. Specifically, we observe that the dictionary
contains the method f, the constructor and the class variable x.
__name__
Observe in the examples above that the functions being called have names that both
start and end with double underscores and that we never explicitly called them! Here
is a proof:
>>> class A:
... def __init__(self): print("Created")
... def __del__(self): print("Destroyed")
...
>>> a=A()
Created
>>> del(a)
Destroyed
Observation:
1. When we created an instance of class A using the statement a=A(), we
observe that the __init__() method was automatically invoked!
2. When we deleted the variable a using the statement del(a), we observe that
the __del__ method was automatically invoked!
12.8.2.2 Stringification
Since many a times objects are instances of custom user-defined classes that the
Python interpreter had no idea about before your program could start execution, we
find it convenient to have a string representation for such objects, especially when we
want to display objects to the user or log them to a file. We use the term
“stringification” to describe the conversion of an object to a string, and we use the
following syntax to perform stringification:
str(object)
Note:
1. This is the constructor of the string class (str) that we had discussed in
section 2.3.4.1.
2. The above statement does not work directly for user-defined classes, but can
be made possible using the concept explained below.
Any attempt to stringify an object will automatically call the magic method __str__ of
that object. If the object does not contain this method, it will use the version provided
by the (nearest) super class, calling this function of the object class in the worst
case. This is demonstrated in the example below:
12. OOP in Python 397
Observation:
1. We created an empty class A and created an instance of it, the reference to
which was stored in the variable a.
2. We see that though the class A did not provide an implementation for the
__str__ method, there was no error and we continue to get a string version
due to the implementation in the object class.
Let us add a method by name __str__ in class A to return a string and observe the
behaviour:
>>> class A:
... def __str__(self):
... return "Hi"
...
>>> a=A()
>>> str(a)
'Hi'
Observation:
1. This time, our class A contains an implementation for the __str__ method,
which is designed to return the string “Hi” when invoked.
2. We create an instance of class A and store it’s reference in the variable a.
3. We observe that str(a) ends up returning the string “Hi”, proving that
str(a) ended up making the call a.__str__().
Some of the sample programs in section 12.10 will make use of this to provide
appropriate string representations of objects of user-defined classes.
Observation:
1. We have defined an empty class A.
2. The variable o refers to an instance of class A and the variable i refers to an
instance of int class with value 5.
3. Due to the fact that we have not overloaded the + operator in class A, we see
that the operation o+i does not work.
>>> class A:
... def __add__(self, x): pass
...
>>> o=A()
>>> i=5
>>> o+i
>>>
12. OOP in Python 399
Observation:
1. This time we observe that o+i does not give any error. This is because o+i
results in the following call: o.__add__(i).
2. Our __add__() method did not do anything since this is a demonstration.
For the same reason, the method also did not return anything.
- __sub__
* __mul__
// __floordiv__
/ __truediv__
% __mod__
divmod() __divmod__
While we know that it is now possible to evaluate o+i since it maps on the
o.__add__(i), the question now is what about evaluating expressions like i+o?
Since i is a reference to the built-in int class, there is no way that class can define
such an operation to work on an instance of a user-defined class, as illustrated below:
>>> class A:
... def __add__(self, x): pass
...
>>> o=A()
>>> i=5
>>> i+a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
400 12. OOP in Python
The solution is additional magic methods that work in the reverse order! Thus, i+o
can be implemented using o.__radd__(i), where __radd__ is the “reverse
addition” magic method, as illustrated below:
>>> class A:
... def __radd__(self, x): pass
...
>>> o=A()
>>> i=5
>>> i+o
>>>
Observation:
1. We observe that i+o works without errors since it maps on to
o.__radd__(i).
2. The __add__ and __radd__ magic methods are related, but neither relies
on the other and it is not mandatory to have both, though possible and in
many cases recommended.
3. Our __radd__() method did not do anything since this is a demonstration.
For the same reason, the method also did not return anything.
The table below lists the magic functions to perform reverse arithmetic operations:
- __rsub__
* __rmul__
// __rfloordiv__
/ __rtruediv__
% __rmod__
divmod() __rdivmod__
+ __pos__
abs() __abs__
NOTE:
Unlike the operators shown in section 12.8.3.1, reverse unary arithmetic operators
don’t exist since these operators, being unary, work on a single operand and that
operand is the instance of the class itself!
float() __float__
bool() __bool__
complex() __complex__
str() __str__
bytes() __bytes__
Note:
1. The __str__ magic method was already covered formally in section
12.8.2.2.
2. The __index__ magic method is supposed to return the same value as
__int__, and the presence of __index__ should ideally also be
accompanied by __int__, though the presence of __int__ does not
mandate implementing __index__!
3. The __index__ magic method is used when converting the integer
equivalent of an instance to other bases, as performed by the bin(), oct()
and hex() functions. All these functions use the return value of __index__
and perform base conversions themselves.
> __gt__
<= __le__
>= __ge__
== __eq__
!= __ne__
| __or__
^ __xor__
<< __lshift__
>> __rshift__
~ __invert__
If the first operand does not support the required bitwise operation, then the second
operand’s reverse magic method gets invoked as summarised in the table below:
| __ror__
^ __rxor__
<< __rlshift__
>> __rrshift__
Note:
1. If the second operand also does not support the reverse bitwise operation, an
error is reported.
2. The ~ operator, being unary, does not have a corresponding reverse operator.
-= __isub__
*= __imul__
//= __ifloordiv__
/= __itruediv__
%= __imod__
**= __ipow__
&= __iand__
|= __ior__
^= __ixor__
<<= __ilshift__
>>= __irshift__
NOTE:
These magic methods ideally should return a reference to a new instance of the
class that represents the result of the operation.
class className:
pass
This section covers some of the valid reasons why we might encounter empty
classes:
12. OOP in Python 405
NOTE:
The example in this section involves exception handling which will be covered in
section 13. Readers are advised to read this section only after completing section
13.
1. #!/usr/bin/python
2.
3. # Implementation of a fixed-length Stack
4.
5. class MyStack:
6.
7. def __init__(self,MAX_SIZE):
8. self.MAX_SIZE = MAX_SIZE
9. self.values=[]
10.
11. def push(self,x):
12. if len(self.values) == self.MAX_SIZE: raise
StackOverflowException()
13. self.values.append(x)
14.
15. def pop(self):
16. if len(self.values) == 0: raise
StackUnderflowException()
406 12. OOP in Python
Output:
1. Push
2. Pop
3. Display
4. Quit
Enter choice:3
1. Push
2. Pop
3. Display
4. Quit
Enter choice:1
Enter value to push:10
1. Push
12. OOP in Python 407
2. Pop
3. Display
4. Quit
Enter choice:1
Enter value to push:20
1. Push
2. Pop
3. Display
4. Quit
Enter choice:1
Enter value to push:30
1. Push
2. Pop
3. Display
4. Quit
Enter choice:1
Enter value to push:40
Stack overflow!
1. Push
2. Pop
3. Display
4. Quit
Enter choice:3
10
20
30
1. Push
2. Pop
3. Display
4. Quit
Enter choice:2
Popped: 30
1. Push
2. Pop
3. Display
4. Quit
Enter choice:3
10
20
1. Push
2. Pop
3. Display
4. Quit
Enter choice:2
Popped: 20
1. Push
2. Pop
408 12. OOP in Python
3. Display
4. Quit
Enter choice:2
Popped: 10
1. Push
2. Pop
3. Display
4. Quit
Enter choice:2
Stack underflow!
1. Push
2. Pop
3. Display
4. Quit
Enter choice:3
1. Push
2. Pop
3. Display
4. Quit
Enter choice:4
1. Push
2. Pop
3. Display
4. Quit
Enter choice:3
1. Push
2. Pop
3. Display
4. Quit
When we start with an empty stack, we first have used option 3 to verify that we
indeed have an empty stack. Let us now add items to the stack:
Enter choice:1
Enter value to push:10
1. Push
2. Pop
3. Display
4. Quit
Enter choice:1
Enter value to push:20
1. Push
12. OOP in Python 409
2. Pop
3. Display
4. Quit
Enter choice:1
Enter value to push:30
1. Push
2. Pop
3. Display
4. Quit
Enter choice:1
Enter value to push:40
Stack overflow!
1. Push
2. Pop
3. Display
4. Quit
As can be seen, we added 10, 20 and 30. But when we attempt to add 40, we get a
“Stack overflow!” message since the maximum size of our stack is 3! We will now
use option 3 to display the stack and observe the stack contents:
Enter choice:3
10
20
30
1. Push
2. Pop
3. Display
4. Quit
We see that 10, 20 and 30 are in the stack, but not the rejected value 40. Let us pop
a single item and display the stack contents to verify that the popped item is indeed
removed from the stack:
Enter choice:2
Popped: 30
1. Push
2. Pop
3. Display
4. Quit
Enter choice:3
10
20
1. Push
410 12. OOP in Python
2. Pop
3. Display
4. Quit
The value 30 was popped out from the stack and is no longer in the stack. Let is pop
out all items one by one:
Enter choice:2
Popped: 20
1. Push
2. Pop
3. Display
4. Quit
Enter choice:2
Popped: 10
1. Push
2. Pop
3. Display
4. Quit
Enter choice:2
Stack underflow!
1. Push
2. Pop
3. Display
4. Quit
12. OOP in Python 411
We get a “Stack underflow!” message when we try to pop out an item from an
empty stack. Let us verify that the stack is indeed empty:
Enter choice:3
1. Push
2. Pop
3. Display
4. Quit
Enter choice:4
Observation:
1. We have define the MyStack class from line 5.
2. The constructor defined in line 7 accepts MAX_SIZE – the maximum size of
the stack – and stores it in the instance variable MAX_SIZE. The constructor
also creates an empty list to house the stack items and stores it in the
instance variable values.
3. The push() method defined in line 11 accepts a value to be pushed on to the
stack. It first checks whether the stack is already full, and if so raises
StackOverflowException instead. Otherwise it appends the given value
to the list of stack values.
4. The pop() method defined in line 15 is supposed to pop out an item from the
stack and return it. First the method checks if the stack is empty, raising
StackUnderflowException in that case, else it pops out an element from
the list of stack values and returns the element.
5. The display() method defined in line 19 displays the contents of the stack
by iterating through the list of stack values.
6. The exceptions being raised in lines 12 and 16 are instances of classes
StackOverflowException and StackUnderflowException, defined in
lines 22 and 23 respectively. Recall from section 13.6 that such classes need
to derive from the Exception class. Since no other functionality is required,
the classes are empty!
7. Lines 40 and 42 show why we needed to differentiate between these classes
for identification of the exception.
412 12. OOP in Python
MyStack2.py
1. #!/usr/bin/python
2.
3. # Implementation of a fixed-length Stack
4.
5. class MyStack:
6.
7. def __init__(self,MAX_SIZE):
8. self.MAX_SIZE = MAX_SIZE
9. self.values=[]
10.
11. def push(self,x):
12. if len(self.values) == self.MAX_SIZE: raise
StackOverflowException()
13. self.values.append(x)
14.
15. def pop(self):
16. if len(self.values) == 0: raise
StackUnderflowException()
17. return self.values.pop()
18.
19. def display(self):
20. for i in self.values: print(i)
21.
22. class StackException(Exception): pass
23. class StackOverflowException(StackException): pass
24. class StackUnderflowException(StackException): pass
25.
26. myStack = MyStack(3)
27.
28. while True:
29. try:
30. print("1. Push")
12. OOP in Python 413
Observation:
1. This program is identical to MyStack.py, with changes in lines 22-24.
2. The output of the program is not shown as it is identical to the output of
MyStack.py.
3. We decided to derive StackOverflowException and
StackUnderflowException from StackException as we feel these 2
exception classes are logically related. The relation is established using the
common base class StackException.
4. Due to exception handling rules (section 13.6), we need to ensure that
StackException extends Exception. But apart from this, we have no
need for adding any other piece of code within class StackException.
5. We can now also catch both StackOverflowException and
StackUnderflowException using StackException in the except
clause, if needed!
414 12. OOP in Python
>>> class A:
... pass
...
>>> a=A()
>>> a.x=10
>>> a.y=20
>>> a.x
10
In such cases, having an empty class does not mean that the instances are empty!
Counter.py
1. #!/usr/bin/python
2.
3. # Implementation of Counter Class
4.
5. class Counter:
6. def __init__(self, count=0): self.set(count)
7.
8. def set(self, count): self._count = count
9. def get(self): return self._count
10. def reset(self): self.set(0)
11.
12. def increment(self, count=1): self.set(self.get()
+count)
13. def decrement(self, count=1): self.set(self.get()-
count)
14.
15. def __add__(self, x): return Counter(self.get()+x)
16. def __radd__(self, x): return self.__add__(x)
17. def __sub__(self, x): return Counter(self.get()-x)
18. def __rsub__(self, x): return self.__sub__(x)
19. def __iadd__(self, x):
20. self.increment(x)
21. return self
22. def __isub__(self, x):
23. self.decrement(x)
24. return self
25.
26. def __str__(self): return str(self.get())
27. def __int__(self): return self.get()
28. def __index__(self): return self.__int__()
29.
30. # Basic counter setting/getting/resetting
31. c = Counter()
32. print(c.get())
33. c.set(9)
34. print(c.get())
35. c.reset()
36. print(c.get())
37.
38. # Basic counter increment and decrement
39. c = Counter(100)
40. print(c.get())
41. c.increment()
42. c.increment(5)
43. print(c.get())
416 12. OOP in Python
44. c.decrement()
45. c.decrement(3)
46. print(c.get())
47.
48. # Basic counter operators
49. c = Counter()
50. c = c + 2
51. c = 3 + c
52. c += 5
53. print(c.get())
54. c = Counter()
55. c = c - 2
56. c = 3 - c
57. c -= 5
58. print(c.get())
59.
60. # Counter type conversions
61. c = Counter(12)
62. print(int(c))
63. print(str(c))
64. print(c)
Output:
0
9
0
100
106
102
10
-10
12
12
12
Observation:
1. The Counter class is defined in line 5. The constructor in line 6 can
optionally receive a count and store it within an instance variable _count. If
this value is not given, it will be assumed to be 0. The instance variable is
protected, making it clear that the external world is not supposed to meddle
with it directly but instead use public methods to gain access to it. These
methods are basically get(), set() and reset().
2. The set() method in line 8 accepts a count and sets it within the instance
12. OOP in Python 417
variable _count.
3. The get() method in line 9 returns the count associated with the invoking
object, stored in the instance variable _count.
4. The reset() method in line 10 resets the counter value back to 0.
5. We see that the constructor is calling the set() method in line 6. Similarly,
the reset() method is calling the set() method in line 10. We prefer reuse
of method this way, even though the efficiency drops, since it makes the code
more reliable and maintainable. For example, if we decide to change the
name of the instance variable _count to count or __count or any other
name, the change will be localised to set() (and get() too), but will have
no impact on the constructor and reset() methods! We will also see to it
that we don’t reference the instance variable _count in any of our methods,
preferring to use the set() and get() methods instead to indirectly access
it.
6. The increment() method in line 12 increments the count of the invoking
object by the specified value (default 1). Again, this is done by calling the
set() method. The decrement() method in line 13 similarly decrements
the count of the invoking object by the specified value (default 1).
7. Basic operator overloading is implemented in lines 15-24. Again, the focus is
on reusability of methods rather than re-implementing within each method.
8. The __add__() method in line 15 is supposed to handle cases of the form
c+x, where c is a Counter object and x is an integer. It is supposed to return
the result of the expression, which we know should be a Counter instance.
9. The __radd__() method in line 16 is supposed to handle cases of the form
x+c, where c is a Counter object and x is an integer. It is supposed to be
identical to c+x in behaviour.
10. The __sub__() method in line 17 and __rsub__() method in line 18
similarly handle the cases c-x and x-c respectively where c is a Counter
object and x is an integer.
11. The __iadd__ method in line 19 handles the case c+=x, where c is a
Counter object and x is an integer. Unlike the __add__() method which
returns the result, here the result has to be stored within the invoking object
itself, and we prefer to return a reference to the invoking object to support
cascading of operations (not demonstrated here).
12. The __isub__() method in line 20 similarly handles the case c-=x, where c
is a Counter object and x is an integer.
13. The methods in lines 26-28 handle conversion to other types. The
__str__() method in line 26 helps convert Counter objects to strings. The
__int__() method in line 27 helps convert Counter objects to integers.
418 12. OOP in Python
1. #!/usr/bin/python
2.
3. # Implementation of Distance Class
4.
5. class Distance:
6. _factors = {"m":1, "km":1000, "mi":1609.34,
"yd":0.9144, "ft":0.3048}
7. def __init__(self, distance=0, unit="m"):
8. self.set(distance, unit)
9.
10. def set(self, distance, unit="m"):
11. self._distance = Distance._normalize(distance,
unit)
12.
13. def get(self, unit="m"):
14. return Distance._externalize(self._distance, unit)
15.
16. def _normalize(distance, unit):
17. if unit == "m": return distance
18. return distance * Distance._factors[unit]
19.
20. def _externalize(distance, unit):
21. if unit == "m": return distance
22. return distance/Distance._factors[unit]
23.
24. def add(self, distance, unit="m"):
25. self.set(self.get() +
Distance._normalize(distance, unit))
26.
12. OOP in Python 419
Output:
1000
1.0
2000
2.0
12.0
1100.0
0.1
500
500
500
Observation:
1. The Distance class is implemented in lines 5-31. The constructor defined in
420 12. OOP in Python
12.11 Questions
1. Explain any 5 principles of OOP.
2. What does a class contain typically in Python?
3. Differentiate between class functions and instance methods.
4. Write a short note on constructors and destructors in Python.
5. Write a short note on magic methods.
6. Write a short note on attribute handling in Python.
7. Write a short note on overloading arithmetic operators on Python.
8. Write a short note on base conversion of objects in Python.
9. Explain inheritance with an example in Python.
10. Explain dynamic polymorphism with an example in Python.
11. How can empty classes be useful in Python? Explain with examples.
12.12 Exercises
1. Write a script in Python to implement the Queue data structure using OOP.
2. Write a script in Python to implement 3 different shapes as individual classes
and calculate their area using dynamic polymorphism.
3. Write a script in Python to demonstrate the overloading of the or operator.
4. Write a class called Time that helps represent the time of day in 24-hour
format. Add these functionalities:
1. Creation of a Time object, given the hour, minute, second and
millisecond.
2. Extraction of the different pieces of the Time object.
3. Addition of the specified absolute number of milliseconds to a Time
object.
4. Comparison of two Time objects to find if they are identical.
5. Comparison of two Time objects to find which comes earlier.
422 12. OOP in Python
SUMMARY
➢ Class variables are defined directly within the class and can be
accessed by all instances of the class or by explicitly using the
class name to gain access it.
➢ Class Functions are functions defined within the class that are
not dependent on the invoking object.
12. OOP in Python 423
SUMMARY
13 EXCEPTION HANDLING
EXCEPTION HANDLING
13.1 Errors vs. Exceptions
Every programmer would have definitely written programs that didn't work as
expected. Sometimes, we programmers violate the syntactic rules of the language
resulting in syntax errors and at other times our programs don't work correctly due to
various runtime anomalies. While syntax errors are definitely the programmer's fault,
various runtime anomalies can be caused due to reasons beyond the instructions in
the program. For example, the program may run out of memory, or is unable to open a
file, or unable to connect to a server, or the user has given invalid input. In these
situations, it would be wrong to blame the programmer for the issue, though a good
programmer always anticipates these issues and handles them gracefully in the
program. Object Oriented programming languages like Python provide a very well-
defined mechanism for dealing with these issues.
Do note however that some of Python's exception classes have names that indicate
Error, but are technically exceptions! For instance, ZeroDivisionError is the name
of an exception that is generated when we perform integer division by 0. On the other
hand, SyntaxError, which is an error, is derived from Exception class and is
treated as an exception. Some things are certainly strange in Python!
try:
...
except exceptionName:
...
Let us understand the flow of control in the above syntax. Since the flow of control
depends on whether or not an exception is raised in the try block, we will consider
both situations separately:
1. If no exception is raised in the try block, the statements in the try block are
executed sequentially and the entire except block is ignored. This is what we
normally expect most of the times, as exceptions are supposed to be raised rarely.
2. If an exception is raised anywhere within the try block during the normal
sequential execution, control immediately exits the try block and starts examining the
except block(s). If it finds an except block capable of handling the exception that
was raised, then that except block is executed and control resumes outside of the
try-except blocks. We will cover more of this in the next section where we see how
to handle exceptions of multiple types. In the syntax above, the except block is
capable of handling an exception of type exceptionName (and also it's derived
classes as we learnt in section 12.7).
3. If an exception is raised anywhere within the try block during the normal
sequential execution, and that exception is not handled by the except block(s), then
control exits the entire try-except construct and the exception propagates. This
13. Exception Handling 427
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Simple try-except block
5.
6. try:
7. n = int(input("Enter an integer:"))
8. quotient = int(100/n )
9. print("100/{} = {}".format(n,quotient))
10. except ZeroDivisionError:
11. print("Sorry! Division by zero is not permitted!")
12.
13. print("Thanks for using this program!")
Output:
Enter an integer:5
100/5 = 20.0
Thanks for using this program!
Observation:
1. This program basically accepts an integer from the user, divides 100 by the
given integer and displays the quotient.
2. We however need to be aware that if the user provides 0 as the input, then
the division will result in a ZeroDivisionError being raised. We need to
handle that case gracefully.
3. The try block extends from line 7 to line 9. The statements within these are
monitored for any exception that may be raised. We are currently expecting
ZeroDivisionError to be raised when the user input is 0. Technically, lines
7 and 9 need not be inside the try block as they are not candidates for
ZeroDivisionError to be raised. We have still kept them inside the try
block for convenience and because the subsequent demo programs build
upon the same program and we will require these statements also to be
monitored for exceptions.
4. The except block contains line 11. This is our response to
ZeroDivisionError.
428 13. Exception Handling
5. Line 7 accepts input from the user, line 8 divides 100 by the input and line 9
prints the result. As can be seen from the output, if the input is 5, the output is
20. Since no exception is raised, after execution of the try block, control
resumes outside the try-except block, in line 12.
We will now consider how the program behaves when the input given is 0.
Output:
Enter an integer:0
Sorry! Division by zero is not permitted!
Thanks for using this program!
Observation:
1. When the input received in line 7 is 0, it results in a ZeroDivisionError in
line 8. This interrupts the flow of control in the try block. Note that line 9 did
not get executed.
2. Line 10 is the beginning of an except block that is capable of handling
ZeroDivisionError, which has occurred. Control therefore resumes from
line 11, within the except block.
3. After execution of the except block, control comes out of the try-except
block and resumes execution from line 12. Thus, line 13 is also executed.
try:
... # try block
except ExceptionName1:
... # Exception handler for ExceptionName1
except ExceptionName2:
... # Exception handler for ExceptionName2
except ExceptionName3:
... # Exception handler for ExceptionName3
...
From this syntax, it is evident that a single try block can be followed by any number
of except blocks, and each except block can handle a specific type of exception.
• If no exception occurs, only the try block is executed and the control comes
out of the entire construct.
• If an exception is raised by any statement in the try block, the control
immediately comes out of the try block and starts checking the except
blocks sequentially. The first except block that can handle the exception that
was raised will then be executed, after which control comes out of the entire
construct, ignoring all other except blocks.
• If none of the except blocks can handle the exception that was raised, then
the exception propagates to the next higher level, which we will examine in
detail in section 13.4.
Let us revisit ExceptionHandlingDemo1.py: if the user does not give us a valid
integer as input, then an attempt to convert such a string to an integer using the
int() function results in ValueError being raised. Let us now handle both
ValueError and ZeroDivisionError.
430 13. Exception Handling
ExceptionHandlingDemo2.py
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Separate handling of each exception
5.
6. try:
7. n = int(input("Enter an integer:"))
8. quotient = int(100/n)
9. print("100/{} = {}".format(n,quotient))
10. except ValueError:
11. print("Invalid input! Please enter an integer.")
12. except ZeroDivisionError:
13. print("Sorry! Division by zero is not permitted!")
14.
15. print("Thanks for using this program!")
Output:
Enter an integer:5
100/5 = 20
Thanks for using this program!
Observation:
1. When the input is 5, no exceptions are raised. Therefore, only the try block
is executed and all the except blocks are ignored.
Output:
Enter an integer:0
Sorry! Division by zero is not permitted!
Thanks for using this program!
Observation:
1. When the input is 0, as seen in the case of ExceptionHandlingDemo1.py,
a ZeroDivisionError is raised in the try block and control immediately
comes out of the try block and the except blocks are examined
sequentially.
2. Line 10 is an except block that can handle ValueError but not
ZeroDivisionError and is therefore skipped.
13. Exception Handling 431
Output:
Enter an integer:hi
Invalid input! Please enter an integer.
Thanks for using this program!
Observation:
1. When the input is hi, the int() function is unable to convert the string to an
integer and raises a ValueError. Control immediately comes out of the try
block and starts examining the except blocks sequentially.
2. Line 10 is an except block that can handle ValueError and is hence
executed, after which control resumes outside the try-except block in line
14.
try:
... # try block
except (ExceptionName1, ExceptionName2 [,ExceptionName3...]):
... # except block
We observe in the syntax above that we can give a tuple of exception types in the
except clause instead of a single exception type!
432 13. Exception Handling
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Common handling of multiple exceptions
5.
6. try:
7. n = int(input("Enter an integer:"))
8. quotient = int(100/n)
9. print("100/{} = {}".format(n,quotient))
10. except (ValueError, ZeroDivisionError):
11. print("Oops! Unable to calculate!")
12.
13. print("Thanks for using this program!")
Output:
Enter an integer:5
100/5 = 20
Thanks for using this program!
Output:
Enter an integer:0
Oops! Unable to calculate!
Thanks for using this program!
Output:
Enter an integer:hi
Oops! Unable to calculate!
Thanks for using this program!
Observation:
1. Line 10 now handles a tuple of exceptions! If the exception raised in the try
block is ValueError or ZeroDivisionError (or any of their derived
classes), then control will enter line 10. The response to both is identical.
13. Exception Handling 433
Before we proceed further, do note that we can mix handling different exceptions
differently with handling multiple exceptions in a common manner as shown in the
sample syntax below:
try:
... # try block
except ExceptionName1:
... # except block
except (ExceptionName2, ExceptionName3, ExceptionName4):
... # except block
except (ExceptionName5, exceptionName6):
... # except block
except ExceptionName7:
... # except block
try:
... # try block
except ExceptionName1:
... # Single exception handler
except (ExceptionName2, ExceptionName3):
... # Multiple exception handler
except:
... # Handles all exceptions not handled above
Observation:
1. As can be seen from the syntax above, the last except block does not
specify any particular exception and is thus considered to mean any and all
exceptions.
2. Such a catch-all handler should be the last except clause, if at all present.
The reason is that no except handlers below this can ever get executed
since the except blocks are tried sequentially and the catch-all handler can
handle any exception.
3. In the above syntax, any exception raised in the try block that is not
ExceptionName1, ExceptionName2 and ExceptionName3 (or it's derived
classes) will be handled in the catch-all handler.
434 13. Exception Handling
Do note that the catch-all handler does not require any previous except blocks. The
following syntax is therefore valid, possible and useful:
try:
... # try block
except:
... # Handle all exceptions here
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Handling all exceptions
5.
6. try:
7. n = int(input("Enter an integer:"))
8. quotient = int(100/n)
9. print("100/{} = {}".format(n,quotient))
10. except:
11. print("Oops! Unable to calculate!")
12.
13. print("Thanks for using this program!")
Output:
Enter an integer:5
100/5 = 20
Thanks for using this program!
Enter an integer:0
Oops! Unable to calculate!
Thanks for using this program!
Enter an integer:hi
Oops! Unable to calculate!
Thanks for using this program!
13. Exception Handling 435
Observation:
1. Any exception generated in the try block will now be handled in the except
block in line 10.
2. We expect ValueError and ZeroDivisionError, both of which will be
handled in line 10.
try:
... # try block
except ExceptionName1 as e:
... # Handler ExceptionName1 with instance as e
except (ExceptionName2, ExceptionName3) as e:
... # Handle both these exceptions with instance as e
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Exception instances
5.
6. import sys
7.
8. try:
9. n = int(input("Enter an integer:"))
10. quotient = int(100/n)
11. print("100/{} = {}".format(n,quotient))
12. except Exception as e:
13. print("Oops! Unable to calculate!")
14. print("Details:",e)
15.
16. print("Thanks for using this program!")
436 13. Exception Handling
Output:
Enter an integer:0
Oops! Unable to calculate!
Details: division by zero
Thanks for using this program!
Enter an integer:hi
Oops! Unable to calculate!
Details: invalid literal for int() with base 10: 'hi'
Thanks for using this program!
Observation:
1. This program is based on ExceptionHandlingDemo4.py, but there are 2
changes in line 12 that are important.
2. The first change is that we are responding to exceptions of type Exception.
Since we want to respond to both ValueError as well as
ZeroDivisionError in the same manner, we would have preferred to use
the catch-all handler (section 13.2.2.3). However, the catch-all handler syntax
cannot be used to process instances and hence we have to think of an
alternative. We utilize the fact that both ValueError and
ZeroDivisionError derive from Exception, hence handling Exception
will allow us to handle them both.
3. The second change in line 12 is that we have used the additional clause “as
e”. This allows us to access the exception instance using e. Of course, e is
just a variable name and we could use any name for accessing the instance.
4. Line 14 prints e – it actually ends up stringifying e (section 12.8.2.2) and
prints that string. We expect the string version of all built-in exceptions to be a
human readable message that makes us understand the cause of the
exception.
5. We can do more with exception instances, but what we can do depends on
the exception. We will cover this aspect in greater detail once we learn to
make our own exception classes in section 13.6.
– a part that is monitored for exceptions (within the try block) and a part that is
excluded from this monitoring (in the else block).
The syntax for implementing this is shown below:
try:
... # try block, monitored for exceptions
except:
... # except block, to respond to exceptions
else:
... # else block, not monitored for exceptions
Observation:
1. The normal code is now split into 2 parts – one part lies within the try block
and the other part lies within the else part.
2. If any exceptions occur within the try block, the suitable except block gets
executed, but the else block is skipped.
3. Only if there were no exceptions in the try block, all the except blocks are
skipped and the else block is executed.
4. If any exceptions are raised within the else block, they are not handled by
this piece of code (but can be handled elsewhere where the exception
propagates, as covered in section 13.4).
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # The else clause
5.
6. try:
7. n = int(input("Enter an integer:"))
8. quotient = int(100/n)
9. except:
10. print("Oops! Unable to calculate!")
11. else:
12. print("100/{} = {}".format(n,quotient))
13.
14. print("Thanks for using this program!")
438 13. Exception Handling
Output:
Enter an integer:5
100/5 = 20
Thanks for using this program!
Enter an integer:0
Oops! Unable to calculate!
Thanks for using this program!
Enter an integer:hi
Oops! Unable to calculate!
Thanks for using this program!
Observation:
1. This program is based on ExceptionHandlingDemo4.py, with line 12 now
being inside the else block instead of being within the try block.
2. The reason why we have separated line 12 from the rest of the try block is
because we feel that that line has no connection with the exceptions we are
expecting – ValueError and ZeroDivisionError. We do feel that it is
part of the normal code of operation and hence it lies in the else block as an
extension of the try block if there were no exceptions.
3. As can be seen from the various output, the behaviour is similar to that of
ExceptionHandlingDemo4.py. The behaviour would have been different,
however, if line 12 were to raise an exception in these 2 programs.
try:
... # try block
except:
... # except block
else:
... # else block
finally:
... # finally block
The flow of control through this construct is shown in the flowchart below. Let us
revisit this once again in detail:
1. The exception monitoring block is the try block. The exception response is
the except block. The else block is a continuation of the try block but
without exception monitoring. The finally block is a continuation of both the
else block (or the try block if the else block is not present) and the
except block.
2. When this construct is executed, control passes on sequentially through all
statements in the try block. If any of these statements raise an exception,
control comes out of the try block and examines the except block(s).
3. If it finds a suitable except block, then that except block gets executed,
ignoring all other except blocks and control then resumes in the finally
block, after which control comes out of this construct.
4. If it doesn't find a suitable except block capable of handling the exception
raised, then the exception has to be propagated to the outer level (covered in
section 13.4 in detail). But before propagation, the finally block is
executed.
5. If no exception is raised in the try block, control jumps to the else block and
continues execution.
6. If any exception is raised in the else block, the exception propagates to the
outer level (covered in section 13.4 in detail), but only after executing the
finally block.
7. If no exceptions were raised in the else block also, then control resumes in
the finally block before coming out the construct.
8. If any exceptions are raised in the finally block, they are propagated to the
outer level (covered in section 13.4 in detail).
The only case when the finally block does not get executed is when the script is
terminated (section 3.4.4).
440 13. Exception Handling
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # The finally clause
5.
6. try:
7. n = int(input("Enter an integer:"))
8. quotient = int(100/n)
9. except:
10. print("Oops! Unable to calculate!")
11. else:
12. print("100/{} = {}".format(n,quotient))
13. finally:
14. print("Thanks for using this program!")
Output:
Enter an integer:5
100/5 = 20
Thanks for using this program!
Enter an integer:0
Oops! Unable to calculate!
Thanks for using this program!
Enter an integer:hi
Oops! Unable to calculate!
Thanks for using this program!
Observation:
1. This program is based on ExceptionHandlingDemo6.py and also
produces the same output.
2. Line 14, which is inside the finally block, will be executed regardless of
whether an exception was raised in the try block or not.
13. Exception Handling 441
raise exceptionInstance
Example:
>>> e=ZeroDivisionError()
>>> raise e
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # The raise statement
5.
6. try:
7. n = int(input("Enter an integer:"))
8. if n>100: raise ValueError()
9. quotient = int(100/n)
10. print("100/{} = {}".format(n,quotient))
11. except ValueError:
12. print("Please enter an integer less than 100!")
13. except ZeroDivisionError:
14. print("Sorry! Division by zero is not permitted!")
15.
16. print("Thanks for using this program!")
Output:
Enter an integer:5
100/5 = 20
Thanks for using this program!
Enter an integer:0
Sorry! Division by zero is not permitted!
Thanks for using this program!
Enter an integer:hi
Please enter an integer less than 100!
Thanks for using this program!
13. Exception Handling 443
Enter an integer:200
Please enter an integer less than 100!
Thanks for using this program!
Observation:
1. This program requires an input less than 100. We wish to generate a
ValueError if this is not the case and handle it appropriately.
2. Since Python by itself is unaware of this requirement of ours and has no
support for testing it itself, we need to add checks of our own. In line 8, we
check if n is greater than 100, and if so we raise ValueError.
3. Lines 11-12 handle this and print a suitable message on ValueError. Note
that ValueError can occur in 2 cases in our program – either the input is not
a valid integer or the input is a valid integer greater than 100. In both cases,
our response is the same.
ExceptionHandlingDemo9.py
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Exception propagation through nested try-except blocks
5.
6. try:
7. try:
8. n = int(input("Enter an integer:"))
9. quotient = int(100/n)
10. print("100/{} = {}".format(n,quotient))
11. except ValueError:
12. print("Please enter an integer!")
13. except ValueError:
14. print("Invalid value!")
15. except ZeroDivisionError:
16. print("Sorry! Division by zero is not permitted!")
17.
Output:
Enter an integer:5
100/5 = 20
Enter an integer:0
Sorry! Division by zero is not permitted!
Enter an integer:hi
Please enter an integer!
Observation:
1. We are expecting 2 different exceptions – ValueError and
ZeroDivisionError - just like ExceptionHandlingDemo2.py. But in
this program, instead of handling the exceptions using 2 except clauses of
the same try block, we have put a try-except block to handle
ValueError (lines 7-12), and put this entire block inside a try block that
monitors and handles ValueError and ZeroDivisionError (lines 6-16).
2. If a ValueError is raised in lines 8-10, then it will be handled in lines 11-12.
No exceptions are reported in the outer try block of line 6, and hence none
of it's except blocks are executed. Note that for this reason, the except
13. Exception Handling 445
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Exception propagation through functions
5.
6. def divideAndPrint():
7. try:
8. n = int(input("Enter an integer:"))
9. quotient = int(100/n)
10. print("100/{} = {}".format(n,quotient))
11. except ValueError:
12. print("Please enter an integer!")
13.
14. try:
15. divideAndPrint()
16. except ValueError:
17. print("Invalid value!")
18. except ZeroDivisionError:
19. print("Sorry! Division by zero is not permitted!")
20.
21.
446 13. Exception Handling
Output:
Enter an integer:5
100/5 = 20
Enter an integer:0
Sorry! Division by zero is not permitted!
Enter an integer:hi
Please enter an integer!
Observation:
1. We have modified ExceptionHandlingDemo9.py and put the entire inner
try-except block into a function called divideAndPrint.
2. The try block of line 7 monitors it's contents for ValueError and
ZeroDivisionError, but is capable of handling only ValueError. Any
occurrence of ZeroDivisionError will propagate through the function and
will be reported to it's caller instead.
3. The function divideAndPrint is called from line 15. This is the place where
ZeroDivisionError is reported, if at all it is raised within
divideAndPrint. This statement is within a try block and can handle
ZeroDivisionError in lines 18-19.
4. Just like ExceptionHandlingDemo9.py, ValueError is not handled in
lines 16-17 despite the except block, but is still not an error.
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Unhandled Exceptions
13. Exception Handling 447
5.
6. def divideAndPrint():
7. try:
8. n = int(input("Enter an integer:"))
9. quotient = int(100/n)
10. print("100/{} = {}".format(n,quotient))
11. except ValueError:
12. print("Please enter an integer!")
13.
14. divideAndPrint()
Output:
Enter an integer:5
100/5 = 20
Enter an integer:hi
Please enter an integer!
Enter an integer:0
Traceback (most recent call last):
File "ExceptionHandlingDemo11.py", line 14, in <module>
divideAndPrint()
File "ExceptionHandlingDemo11.py", line 9, in divideAndPrint
quotient = int(100/n)
ZeroDivisionError: division by zero
Observation:
1. The program is based on ExceptionHandlingDemo10.py and produces
the same output whenever ValueError is raised, as it is handled in exactly
the same way (in line 11).
2. When ZeroDivisionError exception is raised, however, there is no code
to handle it. The exception that is raised in the divideAndPrint function
propagates to the main program (line 14), but since it is not handled there too,
it propagates beyond the program and the exception is reported.
3. Professional code never allows such exceptions to propagate beyond the
program. Any exception expected must be handled appropriately.
448 13. Exception Handling
raise
Note:
1. Observe that this differs from the normal syntax of raise, which requires an
exception object. This syntax is understood to mean that we wish to raise the
same exception that we are currently handling.
2. This special syntax of raise is only available within the except block, and
ends up raising the same exception object that we are currently handling.
Let us modify ExceptionHandlingDemo10.py to incorporate this feature:
ExceptionHandlingDemo12.py
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # Re-raising Exceptions
5.
6. def divideAndPrint():
7. try:
8. n = int(input("Enter an integer:"))
9. quotient = int(100/n)
10. print("100/{} = {}".format(n,quotient))
11. except ValueError:
12. print("Please enter an integer!")
13. raise
14.
15. try:
16. divideAndPrint()
17. except ValueError:
18. print("Invalid value!")
19. except ZeroDivisionError:
20. print("Sorry! Division by zero is not permitted!")
21.
22.
13. Exception Handling 449
Output:
Enter an integer:5
100/5 = 20
Enter an integer:0
Sorry! Division by zero is not permitted!
Enter an integer:hi
Please enter an integer!
Invalid value!
Observation:
1. This program is based on ExceptionHandlingDemo10.py, with only line
13 being an additional line, where the exception is being re-raised.
2. Any occurrence of ZeroDivisionError is handled in exactly the same way
as in ExceptionHandlingDemo10.py and hence will not be discussed
further here.
3. If a ValueError is raised within the try block in lines 8-10, it is handled in the
except block in lines 11-13. However, the raise statement in line 13 results
in the same instance of ValueError being raised again, which propagates
beyond the function divideAndPrint and is reported in the try block in
line 16. This occurrence of ValueError is then handled in the except block
in lines 17-18.
4. Thus, the same exception instance of ValueError is handled both in lines
11-13 and 17-18. Recall that earlier in ExceptionHandlingDemo10.py, the
except block in lines 17-18 would never get executed.
450 13. Exception Handling
def f():
try:
... # try block
except ExceptionName:
... # Handle the exception
raise # Report to caller that this exception occurred
try:
f() # Call the function f
... # Do other stuff if the call to f() was exception-free
except ExceptionName:
... # Respond to the exception
Let us revisit ExceptionHandlingDemo8.py and add our own exceptions for the
following cases:
1. A value of greater than 100 should result in OverflowException
2. A negative value should result in NegativeInputException
3. Division by 0 will continue to give rise to ZeroDivisionError and a non-
numeric input will continue to give rise to ValueError
13. Exception Handling 451
ExceptionHandlingDemo13.py
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # User-defined exceptions
5.
6. class OverflowException(Exception):
7. pass
8.
9. class NegativeInputException(Exception):
10. pass
11.
12. def check(n):
13. if n>100: raise OverflowException()
14. if n<0: raise NegativeInputException()
15.
16. try:
17. n = int(input("Enter an integer:"))
18. check(n)
19. quotient = int(100/n)
20. print("100/{} = {}".format(n,quotient))
21. except OverflowException:
22. print("Please enter an integer less than 100!")
23. except NegativeInputException:
24. print("Please do not enter a negative integer!")
25. except ValueError:
26. print("Please enter an integer!")
27. except ZeroDivisionError:
28. print("Sorry! Division by zero is not permitted!")
Output:
Enter an integer:5
100/5 = 20
Enter an integer:0
Sorry! Division by zero is not permitted!
Enter an integer:hi
Please enter an integer!
452 13. Exception Handling
Enter an integer:200
Please enter an integer less than 100!
Enter an integer:-100
Please do not enter a negative integer!
Observation:
1. We have defined classes OverflowException (line 6) and
NegativeInputException (line 9), meant to be raised when the input is
greater than 100 and when the input is negative respectively.
2. These exception classes of ours derive from the built-in Exception class.
They can also be derived from any of the classes derived from Exception,
like for example, ValueError class.
3. The classes are empty as we do not find the need to add any functionality in
them. We are currently content with the fact that we are able to differentiate
between these exceptions. Soon, we will see what functionality can be added
to these classes to make them more useful.
4. Line 18 makes a call to our check function (defined in line 12) that scrutinizes
the given input and raises OverflowException or
NegativeInputException as required.
5. We handle OverflowException and NegativeInputException in lines
21 and 23 respectively.
The previous example showed how we can define our own exception classes to suit
our requirement. Apart from playing the role of identifying the exception that was
generated, the classes have no other role to play and were therefore empty. Let us
now improvise upon this and add support for also storing the input that was
responsible for the exception. This input can then be queried in the except block and
used as required. For this, we will have to make a couple of changes:
1. At the time of raising the exception object, we will create the exception object
passing the input as an argument so that it can be stored within the exception
object.
2. To support point 1 above, we need to add a constructor to our classes that
receives the input as an argument and stores it in an attribute within the
object.
3. We prefer the attribute to be private to the class in order to support data
hiding, and will therefore provide a public method that provides access to this
private data.
13. Exception Handling 453
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # User-defined exceptions with arguments
5.
6. class OverflowException(Exception):
7. def __init__(self,value):
8. self._value = value
9.
10. def getValue(self):
11. return self._value
12.
13. class NegativeInputException(Exception):
14. def __init__(self,value):
15. self._value = value
16.
17. def getValue(self):
18. return self._value
19.
20. def check(n):
21. if n>100: raise OverflowException(n)
22. if n<0: raise NegativeInputException(n)
23.
24. try:
25. n = int(input("Enter an integer:"))
26. check(n)
27. quotient = int(100/n)
28. print("100/{} = {}".format(n,quotient))
29. except OverflowException as e:
30. print("The given input({}) is too
large!".format(e.getValue()))
31. print("Please enter an integer less than 100!")
32. except NegativeInputException as e:
33. print("The given input({}) is
negative!".format(e.getValue()))
34. print("Please do not enter a negative integer!")
35. except ValueError:
36. print("Please enter an integer!")
37. except ZeroDivisionError:
38. print("Sorry! Division by zero is not permitted!")
454 13. Exception Handling
Output:
Enter an integer:5
100/5 = 20
Enter an integer:0
Sorry! Division by zero is not permitted!
Enter an integer:hi
Please enter an integer!
Enter an integer:200
The given input(200) is too large!
Please enter an integer less than 100!
Enter an integer:-5
The given input(-5) is negative!
Please do not enter a negative integer!
Observation:
1. In lines 21 and 22, when an exception object is being created, the value
responsible for the exception is also passed as an argument to the
constructor.
2. The constructors in lines 7-8 and 14-15 copy the given value into an attribute
_value.
3. The getValue methods in lines 10-11 and 17-19 return the value stored
within the object in the attribute _value.
4. This makes it possible for us to access and print the value in lines 30 and 33.
As a final example, let us create exception classes that do not inherit from
Exception directly, but indirectly derive from it by deriving directly from one of it's
derived classes instead. Let us design our OverflowException and
NegativeInputException classes to derive from ValueError instead of
Exception. If there is no other change in the program, then the behaviour and output
of the program will remain the same as the previous one. We will however use this
program to demonstrate one more concept that has been mentioned earlier – that the
13. Exception Handling 455
except clauses can handle the specified exceptions as well as their derived classes.
We will therefore handle OverflowException, NegativeInputException and
ValueError all in the same manner in an except clause that handles
ValueError. Furthermore, since we are no longer interested in the value that
caused the exception to occur, we will modify ExceptionHandlingDemo13.py
instead of ExceptionHandlingDemo14.py:
ExceptionHandlingDemo15.py
1. #!/usr/bin/python
2.
3. # Exception Handling Demo
4. # User-defined exceptions - deriving indirectly from
Exception
5.
6. class OverflowException(ValueError):
7. pass
8.
9. class NegativeInputException(ValueError):
10. pass
11.
12. def check(n):
13. if n>100: raise OverflowException()
14. if n<0: raise NegativeInputException()
15.
16. try:
17. n = int(input("Enter an integer:"))
18. check(n)
19. quotient = int(100/n)
20. print("100/{} = {}".format(n,quotient))
21. except ValueError:
22. print("Please enter an integer between 0 and 100!")
23. except ZeroDivisionError:
24. print("Sorry! Division by zero is not permitted!")
Output:
Enter an integer:5
100/5 = 20
Enter an integer:0
Sorry! Division by zero is not permitted!
456 13. Exception Handling
Enter an integer:hi
Please enter an integer between 0 and 100!
Enter an integer:200
Please enter an integer between 0 and 100!
Enter an integer:-5
Please enter an integer between 0 and 100!
Observation:
1. Our classes OverflowException (line 6) and NegativeInputException
(line 9) now derive from ValueError instead of Exception.
2. This allows us to handle them both as well as ValueError in line 21.
3. If we want to handle these exceptions separately, we can do so using the
same approach as the one we used in ExceptionHandlingDemo13.py.
Care should be taken that the order in which we handle exceptions are from
derived class to base class. Thus, ValueError should not be handled before
OverflowException, for instance.
13.7 Questions
1. List the advantages of exception handling.
2. Write the complete syntax for exception handling block in Python and explain
the same.
3. Write a short note on raising and re-raising exceptions in Python.
4. Write a short note on exception propagation.
5. Write a short note on user defined exceptions in Python?
13.8 Exercises
1. Write a Python script to demonstrate ValueError exception.
2. Write a Python script to demonstrate user defined exceptions.
3. Write a Python script to demonstrate exception propagation.
13. Exception Handling 457
SUMMARY
➢ The catch block can also catch multiple exception types identified
by a tuple of exception class names.
➢ An except block that does not specify the exception type it can
handle will end up handling all exceptions. Such an except block
cannot be followed by any other except blocks.
➢ Exceptions that are not handled at the level where they are raised
propagate to outer levels, returning from functions if required,
till they are handled somewhere or are reported as unhandled
exceptions.
458 13. Exception Handling
SUMMARY
14 FILE HANDLING
Open files in the desired mode of operation and close them when
done processing.
Seek within files and determine the current position within the file.
460 14. File Handling
FILE HANDLING
14.1 Introduction to File Handling
Extension Format
.txt Text file
Extension Format
.doc, .docx, .odt Document
Do not get confused by the names ‘text’ and ‘binary’ - both of them are ultimately
stored in binary format on a storage device. The distinction is not based on how they
are stored, but on whether or not their contents are human readable when the
individual characters of the file are displayed.
Apart from this basic difference between text files and binary files, there are technical
differences that emerge on how these files are to be accessed and processed by
programs. These are listed below:
1. A text file is typically made up of lines (as we humans are comfortable with
dealing with a line as a unit). These lines need specific characters within the
file to denote their end, and these differ between operating systems.
Windows, for example, terminates lines using 2 characters – the carriage
return (\r) and the linefeed (\n). Linux on the other hand, uses just the
linefeed character to achieve the same purpose. Python being capable of
working on any of these platforms has to support both. Therefore, Python
ensures that end-of-lines are handled by it in a platform-specific manner, but
for the programmer, Python gives the impression that only the newline
character (\n) marks the end of a line. Binary files are not organised in the
form of lines and any carriage return and linefeed characters found are
probably not meant to convey end of line information at all and are thus not to
be interpreted in any special manner.
2. Numbers in a text file need to stored in a human readable format. They are
therefore converted into digits and each digit is stored as a separate
character. Thus, the number 12345 is stored in the file as 5 ASCII characters
(assuming ASCII encoding) - ‘1’, ‘2’, ‘3’, ‘4’ and ‘5’ - and will therefore occupy
5 bytes. The amount of memory required to store a number within a text file
therefore depends on the number of digits it has. A binary files stores
numbers in it’s direct binary form that is of fixed size independent of the
number of digits in the number (provided the number is not unreasonable
long).
3. Seeking within text files (covered in detail in section 14.5.4) is restricted due
to point 1 above that requires Python to deal with the end of line characters in
a platform dependent manner while being platform-independent to the
programmer. Seeking within binary files do not have any such restriction.
open(pathName [,mode])
Note:
1. The file is identified in the filesystem by it’s pathname, which could be relative
or absolute.
2. If the file could not be opened for any reason, FileNotFoundError is
raised.
The permissible modes are listed in the table below:
Mode Meaning
′r’ Open the file for reading.
′rb’ If the file exists and has read permissions, you will be permitted to
read it’s contents using the file object returned.
If the file does not exist or does not have read permissions, this
operation will fail.
Mode Meaning
′r+’ Open the file for reading and writing. This is the only mode that
′r+b’ permits reading as well as writing using the same file object.
If the file exists and has write permissions, you will be permitted to
read from and write to the file, but any writes will overwrite the data
already present at that position within the file.
If the file does not exist, it will be created and you will be permitted to
write new contents to the file and read the same by seeking (covered
in section 14.5).
Note:
1. In the modes listed in the above table, a suffix of ‘b’ indicates that the file
should be treated as a binary file, and the absence of the suffix indicates that
the file should be treated as a text file. Thus, the mode ‘r’ opens a text file for
reading whereas the mode ‘rb’ opens a binary file for reading.
2. The default mode is ‘r’. Thus, if a file is opened using open without specifying
the mode, it will be opened to reading as a text file.
>>> f=open("fib.py",'r')
>>> f
<_io.TextIOWrapper name='fib.py' mode='r' encoding='UTF-8'>
Observation:
1. We are opening the Python script fib.py that was previously saved in the
current directory. We are opening this file in read mode. Note that Python
scripts are text files.
2. On successful opening of the file, a file object is returned and is stored in the
variable f in our example.
14. File Handling 465
Here is an interactive session to show what happens if the file could not be opened:
>>> f=open("blah",'r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'blah'
fileObject.close()
Note:
1. The fileObject used must be one returned by the open() function only.
2. Once a file object is closed, it cannot be used to perform any more operations
on the file it was connected with using this file object. You may of course call
open() again to open any file or continue using other file objects that refer to
opened files.
3. When the Python script terminates, all opened files are automatically closed.
>>> f=open("fib.py",'r')
>>> f
<_io.TextIOWrapper name='fib.py' mode='r' encoding='UTF-8'>
>>> f.closed
False
>>> f.close()
>>> f
<_io.TextIOWrapper name='fib.py' mode='r' encoding='UTF-8'>
>>> f.closed
True
Observation:
1. Even after a file is closed, it’s file object keeps track of the file and access
mode, but does not permit operations on the file.
2. The closed attribute can be used at any time to find out whether the file
linked with a file object is currently closed or still open.
3. Closing a closed file is not an offence, though it is obviously not required.
466 14. File Handling
fileObject.read()
This method will read the entire file contents and will return it as a single string.
Let us write a Python script to display the contents of a file whose filename is entered
by the user (the UNIX cat command does something similar):
cat1.py
1. #!/usr/bin/python
2.
3. # Implementation of cat command to display the contents of
a file
4.
5. filename = input("Enter filename:")
6.
7. try:
8. file = open(filename)
9. contents = file.read()
10. print(contents)
14. File Handling 467
Output:
Enter filename:cat1.py
#!/usr/bin/python
try:
file = open(filename)
contents = file.read()
print(contents)
except Exception as e:
print("Unable to open file: {}".format(filename))
print("Reason: {}".format(str(e)))
Observation:
1. We accept the filename from the user in line 5 and store it in the variable
filename.
2. We then open the file in line 8 and store the file object on success in the
variable file.
3. If the file could not be opened successfully, the exception handling logic in
lines 12-13 print a suitable message.
4. Once the file has been opened, we use the file object obtained to read it’s
contents and store it in the variable contents in line 9.
5. The contents of the variable contents is then printed in line 10.
6. Note that the entire file contents will be present in the variable contents and
the individual lines will be separated by the newline character.
7. If the file contains a newline character at the very end, our variable contents
will also have a terminating newline character.
8. During execution, we have given our Python script filename and hence our
program ends up printing itself! You may of course give any other filename,
but do remember that if the file is not in the current directory, you’ll need to
provide the pathname to the file.
468 14. File Handling
We will now see how the program behaves when it is unable to open the file specified:
Output:
Enter filename:blah
Unable to open file: blah
Reason: [Errno 2] No such file or directory: 'blah'
Observation:
1. In the above execution, we have given the name of a non-existent file and
hence the program is unable to open the same for reading.
2. There are other reasons for being unable to open a file, including insufficient
permissions, insufficient memory and too many files open already
1. #!/usr/bin/python
2.
3. # Implementation of cat command to display the contents of
a file
4.
5. filename = input("Enter filename:")
6.
7. try:
8. file = open(filename)
9. contents = list(file.read())
10. for line in contents: print(line,end='')
11. except Exception as e:
12. print("Unable to open file: {}".format(filename))
13. print("Reason: {}".format(str(e)))
14. File Handling 469
Output:
Enter filename:cat2.py
#!/usr/bin/python
try:
file = open(filename)
contents = list(file.read())
for line in contents: print(line,end='')
except Exception as e:
print("Unable to open file: {}".format(filename))
print("Reason: {}".format(str(e)))
Enter filename:blah
Unable to open file: blah
Reason: [Errno 2] No such file or directory: 'blah'
Observation:
1. This program is very similar to the previous one with changes only in lines 9-
10.
2. A file is a sequence of lines and hence can be passed to the constructor of
list to convert each item of the sequence (line in the file) to an item in the
list. This is done in line 9.
3. Line 10 then iterates through the list and prints each item of the list. Note that
each line is terminated by the newline character and hence we need to ask
print not to print a newline character of it’s own. The last line of the file may or
may not have a terminating newline character, and will be reflected
accordingly in the last item of the list.
470 14. File Handling
fileObject.readlines([sizeHint])
Note:
1. The readlines() method reads the entire file and returns a list of strings
with each item being a line of the file.
2. Each string item of the list will have a terminating newline character, except
perhaps the last line which would depend on whether or not there was a
terminating newline character at the end of the file.
3. The optional sizeHint parameter is the suggested maximum number of
bytes you are willing to read from the file, but the actual result is
implementation dependent and should not be replied upon. When omitted
(which is what we would be doing almost always), it means that we are
interested in reading the entire file contents.
1. #!/usr/bin/python
2.
3. # Implementation of cat command to display the contents of
a file
4.
5. filename = input("Enter filename:")
6.
14. File Handling 471
7. try:
8. file = open(filename)
9. for line in file: print(line,end='')
10. except Exception as e:
11. print("Unable to open file: {}".format(filename))
12. print("Reason: {}".format(str(e)))
Output:
Enter filename:cat3.py
#!/usr/bin/python
try:
file = open(filename)
for line in file: print(line,end='')
except Exception as e:
print("Unable to open file: {}".format(filename))
print("Reason: {}".format(str(e)))
Enter filename:blah
Unable to open file: blah
Reason: [Errno 2] No such file or directory: 'blah'
Observation:
1. This program is similar to the previous program (cat2.py) with the change
highlighted in line 9.
2. While the previous program read the entire file into a list and then iterated
through the list, this program iterates through the file directly, giving scope for
better memory handling.
3. The loop in line 9 could have been replaced by a statement like for line
in list(file) or for line in file.readlines(), but these would
load the entire file in memory resulting in poorer memory management.
472 14. File Handling
fileObject.readline([size])
Note:
1. This method reads and returns a single line from the file identified by the file
object fileObject.
2. The optional size parameter can be used to specify the maximum number of
characters to read from the file. When omitted, it means read till the end of
line (till the newline character is read) or till the end of file.
3. If the read line terminated with a newline character (which would be the case
for all lines except possibly the last line in the file), the terminating newline
character is retained as the last character of the string that is returned.
4. If already at the end of file, a null string is returned.
1. #!/usr/bin/python
2.
3. # Implementation of cat command to display the contents of
a file
4.
5. filename = input("Enter filename:")
6.
7. try:
8. file = open(filename)
9. while 1:
10. line = file.readline()
11. if not line: break
12. print(line,end='')
13. except Exception as e:
14. print("Unable to open file: {}".format(filename))
15. print("Reason: {}".format(str(e)))
14. File Handling 473
Output:
Enter filename:cat4.py
#!/usr/bin/python
try:
file = open(filename)
while 1:
line = file.readline()
if not line: break
print(line,end='')
except Exception as e:
print("Unable to open file: {}".format(filename))
print("Reason: {}".format(str(e)))
Observation:
1. We read a line at a time in line 10 using readline() and store the line as a
string in the variable line.
2. On encountering end of file, the string returned by readline() will be empty,
which is tested in line 11.
3. Lines 9-12 constitute an infinite loop that terminates only when line 11 detects
the end of file.
We revisit the read method covered in section 14.3.1.1 for this purpose, with it’s
complete syntax:
fileObject.read(bytes)
Note:
1. This method reads upto bytes number of bytes from the file identified by
fileObject and returns the result as a string.
2. On encountering end of file, this method returns the read data immediately
and could possibly return lesser than bytes number of bytes.
3. If already at the end of file at the time of making the call, this method returns a
null string.
4. This method does not stop reading on encountering the newline character.
Any newline character found in the file is merely considered to be a character.
5. If bytes is not provided or is given as -1, this will end up reading all the
characters of the file, as covered in section 14.3.1.1.
Here is a re-implementation of the previous program using read() to read 100 bytes
at a time:
cat5.py
1. #!/usr/bin/python
2.
3. # Implementation of cat command to display the contents of
a file
4.
5. filename = input("Enter filename:")
6.
7. try:
8. file = open(filename)
9. while 1:
10. data = file.read(100)
11. if not data: break
12. print(data,end='')
13. except Exception as e:
14. print("Unable to open file: {}".format(filename))
15. print("Reason: {}".format(str(e)))
14. File Handling 475
Output:
Enter filename:cat5.py
#!/usr/bin/python
try:
file = open(filename)
while 1:
data = file.read(100)
if not data: break
print(data,end='')
except Exception as e:
print("Unable to open file: {}".format(filename))
print("Reason: {}".format(str(e)))
Observation:
1. This program is very similar to cat4.py with changes in lines 10-12.
2. This program reads 100 bytes at a time and prints it.
3. As an example, if the file contains 250 bytes, then line 10 of this program
ends up reading 100, 100, 50 and 0 bytes over 4 iterations and the loop is
terminated by line 11.
1. #!/usr/bin/python
2.
3. # Implementation of cat command to display the contents of
a file
4.
5. filename = input("Enter filename:")
476 14. File Handling
6.
7. try:
8. file = open(filename)
9. while 1:
10. data = file.read(1)
11. if not data: break
12. print(data,end='')
13. except Exception as e:
14. print("Unable to open file: {}".format(filename))
15. print("Reason: {}".format(str(e)))
Output:
Enter filename:cat6.py
#!/usr/bin/python
try:
file = open(filename)
while 1:
data = file.read(1)
if not data: break
print(data,end='')
except Exception as e:
print("Unable to open file: {}".format(filename))
print("Reason: {}".format(str(e)))
Observation:
1. This program is very similar to cat5.py with the only change being that this
program reads 1 byte at a time in line 10 instead of 100 bytes.
2. This program will run slower than cat5.py simply due to the number of
iterations of the loop and the statements within it. The speed difference
cannot be easily (visibly) detected on small files.
14. File Handling 477
fileObject.write(string)
Note:
1. This method writes the contents of the given string (string) to the file
specified by the invoking file object (fileObject).
2. This method does not add a terminating newline character of its own. If the
string terminates with a newline character, it is also written, else no newline is
automatically added to the file.
3. The file identified by the file object (fileObject) must have been opened in
a mode that supports writing, else the operation fails.
Here is a program that generates the factorial of all integers from 1 to 10 and stores
them in a file called factorials.txt:
factorials.py
1. #!/usr/bin/python
2.
3. # Storing factorials in a text file
4.
5. from math import factorial
6.
7. try:
8. file = open("factorials.txt","w")
9. for i in range(1,11):
10. file.write("The factorial of {} is
{}\n".format(i,factorial(i)))
11. except Exception as e:
12. print("Unable to open file: {}".format(filename))
13. print("Reason: {}".format(str(e)))
14. else:
15. print("Factorials successfully written to file
factorials.txt")
Output:
The factorial of 1 is 1
The factorial of 2 is 2
The factorial of 3 is 6
The factorial of 4 is 24
The factorial of 5 is 120
The factorial of 6 is 720
The factorial of 7 is 5040
The factorial of 8 is 40320
The factorial of 9 is 362880
The factorial of 10 is 3628800
Observation:
1. Line 8 opens the file factorials.txt in write mode ("w"). Instead of hard-
coding the filename, we could also accept the filename from the user.
2. Line 10 uses the write() method to write a formatted string into the file.
Note that we have taken care to terminate the string with a newline so that the
file contents will be well formatted and readable.
3. Line 15 is executed only if there were no exceptions and the strings were
written to the file successfully.
…...
When a file is opened for reading or writing, the current position in that file will be byte
0 (the beginning of the file). This means that next read or write will start from position
0 and will increment the position by the number of bytes successfully read or written.
(An exception to this is writes to files that have been opened in append modes, in
which case data will always be written at the end of the file instead of at the current
position). As we continue reading or writing, the position keeps incrementing, ensuring
sequential access of the file contents.
While writing to the file, it is possible to go past the end of the file, in which case the
file grows with the writes. While reading from a file, it is not possible to go past the end
of the file.
Any particular position in the file can be selected by means of an offset applied from
one of the standard positions:
1. From the beginning of the file (identified by the standard position SEEK_SET)
2. From the current position within the file (identified by the standard position
SEEK_CUR)
3. From the end of the file (identified by the standard position SEEK_END)
A positive offset implies those many bytes after the selected standard position
whereas a negative offset implies those many bytes before the selected standard
position. An offset value of 0 implies exactly at the selected standard position.
The table below shows some examples of how these offsets combined with standard
positions can help locate a position in a file:
10 SEEK_CUR 10 bytes after the current position in the file (useful for
the skipping over the next 10 bytes in the file)
480 14. File Handling
10 SEEK_END 10 bytes after the end of the file (valid only if the file has
been opened for writing, in which case the file will grow
along with the change in position)
fileObject.seek(offset [,whence])
Note:
1. The offset parameter specifies an offset relative to a standard position that
is selected in the second parameter, whence.
2. The whence parameter is the standard position selection relative to which the
offset is applied. The values for this parameter has to be one of
os.SEEK_SET (0), os.SEEK_CUR(1) or os.SEEK_END(2). If this parameter
is not provided, the default standard position assumed in os.SEEK_SET.
3. For text files, the offset must be 0 or a value that was previous returned by the
tell() method (covered in section 14.5.3). For binary files, the offset can be
any arbitrary number.
fileObject.tell()
Note:
1. This method returns the current position within the file represented by the file
object (fileObject), measured as the number of bytes from the beginning
of the file.
2. The value returned by the tell() method can be used directly as an offset in
the seek() method with the standard position as SEEK_SET to return to this
position whenever required irrespective of where the current position is later
on during the execution of the script.
3. For text files, the value returned by the tell() method and the special offset
of 0 are the only valid values that can be used as offset in the seek()
method.
In order to write an object to a binary file, the dump() function can be used, which has
the following syntax:
pickle.dump(object,file)
Note:
1. In the above syntax, pickle is the name of the module and dump is the
name of a function in that module. There are other ways of pickling an object,
but we will stick to this syntax.
2. The parameter object is any object you wish to store in the file and the
parameter file is the file object identifying the file to which the object needs
to be written.
3. The file is expected to be opened in a mode that supports writing in binary.
Here is a program that uses a class called Employee to represent an employee, and
allows the user to add one such employee record to a binary file called
employees.dat:
add_employee.py
1. #!/usr/bin/python
2.
3. # Program to write employee records to a file
4.
5. import pickle
6.
7. class Employee:
8. def __init__(self,name,id,designation):
9. self.name = name
10. self.id = id
11. self.designation = designation
12.
13. try:
14. file = open("employees.dat","ab")
15. name = input("Enter employee name:")
16. id = int(input("Enter employee ID:"))
17. designation = input("Enter employee designation:")
18.
19. pickle.dump(Employee(name,id,designation),file)
20. except Exception as e:
21. print("Unable to open file!")
22. print("Reason: {}".format(str(e)))
23. else:
24. print("Employee record successfully added to file!")
14. File Handling 483
Output:
Observation:
1. We import the pickle module in line 5. We define our Employee class in
line 7.
2. The Employee class contains a constructor (line 8) that accepts the name, ID
and designation of the employee and stores these details within the object.
3. We open the file employees.dat in append binary (ab) mode in line 14.
Since we are planning to add records to the file without overwriting any
existing records, the append mode is necessary. Since we plan to write
objects, the file should be binary in nature. Note that this mode creates the file
if it is not already present.
4. We accept the employee details from the user in lines 15-17.
5. We construct an Employee object in line 19 using the details the user had
given and write them to the file file using pickle.dump function.
6. Any errors are handled in lines 21-22. On success, a message is printed by
line 24.
pickle.load(file)
Note:
1. The parameter file identifies the file from which an object has to be loaded
from the current position in the file. The file is expected to be binary and
should have been opened in a mode that supports reading.
2. The object that is successfully loaded from the file is returned by this function.
484 14. File Handling
We will now write a program that lists all employee records in the file
employees.dat to which records were written by the previous program,
add_employee.py.
list_employees.py
1. #!/usr/bin/python
2.
3. # Program to list employee records from a file
4.
5. import pickle
6.
7. class Employee:
8. def __init__(self,name,id,designation):
9. self.name = name
10. self.id = id
11. self.designation = designation
12.
13. try:
14. file = open("employees.dat","rb")
15.
16. while 1:
17. employee = pickle.load(file)
18. print("Name: {} ID: {} Designation:
{}".format(employee.name,employee.id,employee.designation))
19.
20. except EOFError: pass
21. except Exception as e:
22. print("Unable to open file!")
23. print("Reason: {}".format(str(e)))
Output:
1. #!/usr/bin/python
2.
3. # Implementation of head command to display the first 10
lines of a file
4.
5. filename = input("Enter filename:")
6.
7. try:
8. file = open(filename)
9. for i in range(10):
10. data = file.readline()
11. if not data: break
12. print(data,end='')
13. except Exception as e:
14. print("Unable to open file: {}".format(filename))
15. print("Reason: {}".format(str(e)))
Output:
Enter filename:head.py
#!/usr/bin/python
try:
file = open(filename)
for i in range(10):
data = file.readline()
486 14. File Handling
Observation:
1. We ask the user for the filename in line 5 and open the file in line 8. Errors are
handled in lines 14-15.
2. We run a loop that executes 10 times in line 9. In each iteration, we read 1
line from the file (line 10), check whether we have reached the end of file (line
11) and print the line read if successful (line 12).
3. If the file contains less than 10 lines, the loop terminates prematurely in line
11 and we end up displaying as many lines as were present.
1. #!/usr/bin/python
2.
3. # Implementation of tail command to display the last 10
lines of a file
4.
5. filename = input("Enter filename:")
6.
7. try:
8. file = open(filename)
9. lines = file.readlines()
10. if len(lines)>10: lines = lines[-10:]
11. print("".join(lines),end='')
12. except Exception as e:
13. print("Unable to open file: {}".format(filename))
14. print("Reason: {}".format(str(e)))
Output:
Enter filename:tail.py
filename = input("Enter filename:")
try:
file = open(filename)
lines = file.readlines()
if len(lines)>10: lines = lines[-10:]
print("".join(lines),end='')
except Exception as e:
print("Unable to open file: {}".format(filename))
print("Reason: {}".format(str(e)))
14. File Handling 487
Observation:
1. We accept the filename from the user in line 5 and open the file in line 8.
Errors are handled in lines 13-14.
2. We load the entire file into a list in line 9. We need the last 10 entries of this
list.
3. If there are less than 10 lines in the file, they should be displayed as it is. This
check is made in line 10 and if there are more than 10 lines, we pick the last
10 lines of the list.
4. The list contents are converted into a single string and printed in line 11.
5. This approach is simple but inefficient as the entire file is loaded into memory.
Especially when dealing with big files, it is better to develop an alternate logic
that loads a single line at a time and maintains at most 10 lines in memory.
1. #!/usr/bin/python
2.
3. # Implementation of head and tail command combination to
display consecutive lines of a file
4.
5. filename = input("Enter filename:")
6. start = int(input("Enter starting line:"))
7. length = int(input("Enter number of lines:"))
8. try:
9. file = open(filename)
10. lines = file.readlines()
11. lines = lines[start-1:start+length-1]
12. print("".join(lines),end='')
13. except Exception as e:
14. print("Unable to open file: {}".format(filename))
15. print("Reason: {}".format(str(e)))
488 14. File Handling
Output:
Enter filename:headtail.py
Enter starting line:6
Enter number of lines:4
start = int(input("Enter starting line:"))
length = int(input("Enter number of lines:"))
try:
file = open(filename)
Observation:
1. We accept the filename (line 5), starting line number (line 6) and number of
lines to be displayed (line 7) from the user and open the file in line 9 and read
all its contents into a list in line 10.
2. Since the user numbers from 1 while the list index starts from 0, the first line
required from the list is at index start-1. We extract the required lines from
the list in line 11 and display them after converting it to a string in line 12.
3. This program is simple but inefficient as the entire file contents are loaded to
memory. A more memory efficient alternative would be to load the file contents
line by line and conditionally display it by keeping track of the current line
number.
1. #!/usr/bin/python
2.
3. # Implementation of wc command to display
4. # the number of characters, words and lines in a file.
5.
6. filename = input("Enter filename:")
7. chars,words,lines = 0,0,0
8.
9. try:
10. file = open(filename)
11. while 1:
12. line = file.readline()
13. if not line: break
14. lines += 1
14. File Handling 489
Output:
Enter filename:wc.py
Number of characters : 613
Number of words : 73
Number of lines : 23
Observation:
1. We obtain the filename from the user in line 6 and open the file in line 10.
Errors are handled in lines 18-19.
2. We read the file contents a line at a time (line 12). Each time we read a line,
we increment the number of lines read (lines) in line 14.
3. We add the number of characters found in that line to the variable chars in
line 15.
4. We use the split() function (section 8.3.2) to split the line into a list of
words and count the number of words using len() and add that to words in
line 16.
5. The results are displayed in lines 21-23 if there are no errors.
1. #!/usr/bin/python
2.
3. # Implementation of cp command to copy files
4.
490 14. File Handling
Output:
Observation:
1. We obtain the source filename and destination filename from the user in lines
5-6.
2. We open the source file for reading in binary mode in line 9. The binary mode
ensures that our program works for all files – whether textual or not. Similarly,
we open the destination file for writing in binary mode in line 10. All errors are
handled in lines 16-17.
3. We read 1 byte at a time from the source file (line 12) and write the byte read
into the destination file (line 14). Line 13 ensures that this loop is broken on
encountering the end of file on the source file.
cmp.py
1. #!/usr/bin/python
2.
3. # Implementation of cmp command to compare files
4.
5. filename1 = input("Enter first filename:")
6. filename2 = input("Enter second filename:")
7. byte,line = 1,1
8.
9. try:
10. file1 = open(filename1)
11. file2 = open(filename2)
12. while 1:
13. char1 = file1.read(1)
14. char2 = file2.read(1)
15. if not char1 and not char2: break
16. elif not char1 or not char2 or char1 != char2:
17. print("{} {} differ: byte {}, line
{}".format(filename1,filename2,byte,line))
18. break
19. byte += 1
20. if char1 == '\n': line += 1
21. except Exception as e:
22. print("Unable to open file")
23. print("Reason: {}".format(str(e)))
Output:
Observation:
1. We accept the 2 filenames from the user in lines 5-6. Error handling is
performed by lines 22-23.
2. We open both the files for reading in lines 10-11. We have assumed the files
to be textual. If binary files are to be supported, we can change the mode to
"rb".
492 14. File Handling
1. #!/usr/bin/python
2.
3. # Implementation of cut command to extract vertical slices
from files
4.
5. filename = input("Enter filename:")
6. start = int(input("Enter starting column:"))
7. end = int(input("Enter ending column:"))
8.
9. try:
10. file = open(filename)
11. while 1:
12. line = file.readline()
13. if not line: break
14. print(line[start-1:end])
15. except Exception as e:
16. print("Unable to open file: {}".format(filename))
17. print("Reason: {}".format(str(e)))
14. File Handling 493
Output:
Enter filename:cut.py
Enter starting column:4
Enter ending column:8
usr/b
mplem
ename
rt =
= in
file
whil
ept E
prin
prin
Observation:
1. We accept the filename, starting byte and ending byte from the user in lines 5-
7.
2. We open the file for reading in line 10. Error handling is performed by lines
16-17.
3. We read line by line from the file till the end of file using lines 11-13.
4. We display only a selected portion of the line (line 14). We use start-1 as
the index starts from 0 whereas the user uses a 1-based numbering.
5. This program works with only a simple byte range. The actual cut command
also supports extraction based on fields and also supports specification of the
field delimiter. The same functionality can be implemented by making use of
the split() function to split the read line into pieces and then displaying the
required field range.
494 14. File Handling
1. #!/usr/bin/python
2.
3. # Implementation of an employee management system
4.
5. import os
6. import pickle
7.
8. class Employee:
9. def __init__(self,name,id,designation):
10. self.name = name
11. self.id = id
12. self.designation = designation
13.
14. def menu():
15. print("Employee Management System")
16. print("==========================")
17. print("1. Add Employee")
18. print("2. List Employees")
19. print("3. Search Employee")
20. print("4. Delete Employee")
21. print("5. Quit")
22. return int(input("Enter choice:"))
23.
24. def do_add_employee(file):
25. name = input("Enter employee name:")
26. id = int(input("Enter employee ID:"))
27. designation = input("Enter employee designation:")
28.
29. file.seek(0,os.SEEK_END)
30. pickle.dump(Employee(name,id,designation),file)
31.
14. File Handling 495
32.
33. def do_list_employees(file):
34. try:
35. file.seek(0)
36. while 1:
37. employee = pickle.load(file)
38. print("Name: {} ID: {} Designation:
{}".format(employee.name,employee.id,employee.designation))
39. except EOFError: pass
40.
41. def do_search_employee(file):
42. id = int(input("Enter ID to search:"))
43. try:
44. file.seek(0)
45. while 1:
46. employee = pickle.load(file)
47. if id == employee.id:
48. print("Name: {} ID: {} Designation:
{}".format(employee.name,employee.id,employee.designation))
49. break
50. except EOFError: print("Record not found!")
51.
52.
53. def do_delete_employee(file):
54. id = int(input("Enter ID to delete:"))
55. file.seek(0)
56. tempfile = open("temp.dat","w+b")
57.
58. try:
59. while 1:
60. employee = pickle.load(file)
61. if not employee.id == id:
62. pickle.dump(employee,tempfile)
63. except EOFError: pass
64.
65. file.seek(0)
66. tempfile.seek(0)
67. try:
68. while 1:
69. employee = pickle.load(tempfile)
70. pickle.dump(employee,file)
71. except EOFError:
72. file.flush()
73. file.truncate()
74. tempfile.close()
75. os.remove("temp.dat")
76.
496 14. File Handling
77. filename="employees.dat"
78. try:
79. if os.path.isfile(filename): mode="r+b"
80. else: mode="w+b"
81. file = open(filename,mode)
82. while 1:
83. choice = menu()
84. if choice == 1: do_add_employee(file)
85. elif choice == 2: do_list_employees(file)
86. elif choice == 3: do_search_employee(file)
87. elif choice == 4: do_delete_employee(file)
88. elif choice == 5: break
89. else: print("Invalid choice!")
90. except Exception as e:
91. print("Unable to open file: {}".format(filename))
92. print("Reason: {}".format(str(e)))
Output:
Enter choice:2
Name: Ram ID: 1 Designation: Manager
Name: Sham ID: 2 Designation: Manager
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:1
Enter employee name:Anthony
Enter employee ID:5
Enter employee designation:Team Lead
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:2
Name: Ram ID: 1 Designation: Manager
Name: Sham ID: 2 Designation: Manager
Name: Anthony ID: 5 Designation: Team Lead
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:3
Enter ID to search:2
Name: Sham ID: 2 Designation: Manager
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:3
Enter ID to search:3
Record not found!
Employee Management System
==========================
498 14. File Handling
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:4
Enter ID to delete:4
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:2
Name: Ram ID: 1 Designation: Manager
Name: Sham ID: 2 Designation: Manager
Name: Anthony ID: 5 Designation: Team Lead
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:4
Enter ID to delete:2
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:2
Name: Ram ID: 1 Designation: Manager
Name: Anthony ID: 5 Designation: Team Lead
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:4
Enter ID to delete:5
Employee Management System
14. File Handling 499
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:2
Name: Ram ID: 1 Designation: Manager
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:4
Enter ID to delete:1
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:2
Employee Management System
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:5
Notes on Output:
1. This program uses the same data file that was used by the programs
add_employee.py (section 14.6) and list_employees.py (section 14.7).
This means that all the 3 programs can work with each other if necessary.
2. Since there are multiple operations permissible, the sample output
demonstrates many of these operations. We will examine the output piece by
piece.
3. We assume that this program is executed after what we had demonstrated in
add_employee.py (section 14.6). The file employees.dat therefore
contains 1 employee record (of the employee named Ram).
500 14. File Handling
4. We will begin by listing out the file contents. We expect the employee details
of “Ram” to be listed.
5. We will now use the “Add Employee” option to add our second employee to
the file.
6. Let us now verify that the employee details of “Sham” are indeed present in
the file.
7. We will now similarly add another employee record and verify that it has been
successfully saved.
14. File Handling 501
8. Now that we are convinced that the “Add Employee” and “List Employees”
options work perfectly fine, let us explore the “Search Employee” option and
search for the employee details of the employee whose ID is 2.
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:3
Enter ID to search:3
Record not found!
10. Now let us explore deletion. We will start by attempting to delete a non-
existent employee and verify that the file contents are not disturbed.
11. Despite the attempted deletion, we continue to get the same employee details
on listing. We will now delete the employee with ID 2.
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:2
Name: Ram ID: 1 Designation: Manager
Name: Anthony ID: 5 Designation: Team Lead
12. We observe that the employee record did get deleted! Deletion is the most
complex operation in this program, so let us verify that we can indeed delete
any record. The previous deletion was an example of deletion of a record
somewhere in between the file. Let us verify deletion at the end.
13. Worked! Now finally, deletion of the first record (and the only record of the file)
==========================
1. Add Employee
2. List Employees
3. Search Employee
4. Delete Employee
5. Quit
Enter choice:2
14. Now the file is empty and we have verified all operations. Let us terminate the
program.
Now that we’re convinced the program works perfectly, let is analyse the code.
Observation:
1. We import the os module for gaining access to the constants used for
seeking, and the pickle module to be able to read and write objects from/to
binary files in lines 5-6.
2. The same Employee class that was used in earlier programs is defined in
lines 8-12.
3. The menu() function in lines 14-22 is used for displaying the program menu,
waiting for the user to enter the menu choice and returning the choice made
by the user.
4. The do_add_employee() function in lines 24-30 is used to add an
employee record into the file by taking input from the user.
5. The do_list_employees() function in lines 33-39 is used to list all
employee records from the file.
6. The do_search_employee() function in lines 41-50 searches for an
employee with the ID specified by the user and displays it if present, or an
error message otherwise.
7. The do_delete_employee() function in lines 53-75 deletes an employee
14. File Handling 505
record based on an ID that is provided by the user. For simplicity, if the record
is not found, no message is displayed!
8. We will examine each of these functions in subsequent points, but let us first
focus on the main program code in lines 77-92.
9. Line 77 specifies the name of the file we will be using in the program. Since
we have many different functions that have to work on the same file, and too
in perhaps different modes, we prefer to open the file once in a mode that
allows reading as well as writing in binary mode and use the file object across
all the functions. This file object is created in line 81. If the file exists, it’s
contents should not be disturbed; whereas if the file does not exist, it should
be created. Line 79 verifies the existence of the file using
os.path.isfile() function, and the mode decision is taken suitably in
lines 79-80.
10. Lines 82-89 provide a continuous menu system, using the menu() function
to display the menu and allow the user to make a choice, and calling other
suitable functions based on the user’s choice.
11. The open() function in line 81 can fail and this exception handling (along
with any other like ValueError when the user does not provide a numeric
input for the menu choice) is performed in lines 90-92.
12. Let us make observations on each of the functions by providing their code
again for quick reference.
1. def menu():
2. print("Employee Management System")
3. print("==========================")
4. print("1. Add Employee")
5. print("2. List Employees")
6. print("3. Search Employee")
7. print("4. Delete Employee")
8. print("5. Quit")
9. return int(input("Enter choice:"))
1. The menu() function prints a menu and waits the user for making a choice in
lines 2-8.
2. The choice is converted to an integer and returned in line 9.
1. def do_add_employee(file):
2. name = input("Enter employee name:")
3. id = int(input("Enter employee ID:"))
506 14. File Handling
1. The do_add_employee() function accepts input from the user for a single
employee record in lines 2-4.
2. Since we need to add this employee record to the given file and appending is
the simplest way of adding to a file, we prefer to seek to the end of the file in
line 6.
3. We then create an Employee object using the input the user provided and
write it to the file in line 7.
1. def do_list_employees(file):
2. try:
3. file.seek(0)
4. while 1:
5. employee = pickle.load(file)
6. print("Name: {} ID: {} Designation:
{}".format(employee.name,employee.id,employee.designation
))
7. except EOFError: pass
1. def do_search_employee(file):
2. id = int(input("Enter ID to search:"))
3. try:
4. file.seek(0)
5. while 1:
6. employee = pickle.load(file)
7. if id == employee.id:
14. File Handling 507
1. def do_delete_employee(file):
2. id = int(input("Enter ID to delete:"))
3. file.seek(0)
4. tempfile = open("temp.dat","w+b")
5.
6. try:
7. while 1:
8. employee = pickle.load(file)
9. if not employee.id == id:
10. pickle.dump(employee,tempfile)
11. except EOFError: pass
12.
13. file.seek(0)
14. tempfile.seek(0)
15. try:
16. while 1:
17. employee = pickle.load(tempfile)
18. pickle.dump(employee,file)
19. except EOFError:
20. file.flush()
21. file.truncate()
22. tempfile.close()
23. os.remove("temp.dat")
508 14. File Handling
14.9 Questions
1. What are the 3 logical steps involved in accessing a file?
2. How would you differentiate between text & binary files? Give examples of
text & binary file formats.
3. List and explain all the access modes used to operate on files.
4. How can we confirm if a file has been closed or not?
5. Is it advisable at all times to read the entire contents of a file in one go? If not,
why would you avoid it and how?
6. How do we read a file’s contents a line at a time?
7. Which function in Python would you use to read few bytes from a file? Give its
syntax along with an example.
14. File Handling 509
14.10 Exercises
1. Accept details of all your friends from standard input (e.g. Name, Date of
birth, Hobbies, occupation, residence address) and store it in a binary file
as individual records. Allow the user to list out all details of a friend, given
his/her name.
2. Create a text file with content as a short summary highlighting your profile
and the interesting events that took place in your lifetime. Write a Python
program that lists out all repetitive words (e.g 'I' in the sentence “I think
I will go”).
3. Write a Python script that lists out each line of a file prefixed with it’s line
number.
510 14. File Handling
SUMMARY
➢ The open() function opens the specified file in the specified mode
(read mode by default) and returns a file object on success. This
file object can then be used to interact with the file contents.
➢ When done with interacting with a file contents, the file object
can be used to invoke the close() method to close the connection
with the file. Doing so is optional and all open files are eventually
closed when the script ends or when the file object is no longer in
use by the program, but explicitly closing files when they are no
longer needed can prove efficient in certain situations.
SUMMARY
15 MODULES
MODULES
15.1 Need for Modules
So far, we have seen 2 ways of executing Python code:
1. By typing Python instructions directly into the interpreter and getting it
executed immediately
2. By storing Python instructions in a file and running it as a scripting
While the former approach has the advantage of quickly getting the output for any
instruction, when it comes to writing larger pieces of code, the latter approach is
definitely preferred. In the former approach, all code and data (functions and
variables) introduced are forgotten the moment the user quits the interpreter. In the
latter approach, the program runs the same way each time the user runs it.
When we write bigger pieces of code, we observe the following:
1. Our program might become too long and we want a way to organise it into
manageable chunks (files)
2. We might have multiple Python scripts with various functions already defined
that we wish to use in another piece of work
The solution for both the problems above is modules! A module is a reusable piece of
Python code that can be used by other Python programs (and modules). In fact, when
we run a Python script, it is assumed to be running as a default module called
__main__. The name of the current module is available in the variable __name__.
>>> print(__name__)
__main__
1. #!/usr/bin/python
2.
3. # Module to find the factorial of an integer
4.
5. def factorial(n):
6. if n==0: return 1
7. return n*factorial(n-1)
Observation:
1. This is a simple Python script, despite we calling it a module. No special
changes were made to convert this script into a module.
2. This module is named fact because the filename is fact.py. Thus, module
names are derived from filenames.
3. This module defines a function called factorial that returns the factorial of
the given integer.
4. When this script is executed, there is no output as there is no code in the
main script. This can be changed however, as covered in section 15.3.3, but
for the moment we are happy with the way this module is coded.
import moduleName
15. Modules 515
In our example, since fact is the name of the module, this is how we'd import it
before using the factorial function:
1. #!/usr/bin/python
2.
3. # Program to find the combination of 2 integers using
modules.
4.
5. import fact
6.
7. def ncr(n,r):
8. return int(fact.factorial(n)/
(fact.factorial(r)*fact.factorial(n-r)))
9.
516 15. Modules
Output:
Enter 2 integers:5 3
The combination of 5 and 3 is 10
Observation:
1. Line 5 imports the fact module, making it's content available to us via the
identifier fact.
2. Line 8 uses the factorial function of the fact module using the name
fact.factorial().
Python does not exactly differentiate between the built-in modules and user-defined
modules. You might recall that the built-in math module does have a similar
factorial function defined. However, without importing the math module, we won't
be able to access the factorial function, as shown below:
>>> factorial(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'factorial' is not defined
>>> import math
>>> factorial(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'factorial' is not defined
>>> math.factorial(5)
120
This will copy the identifier identifier from the symbol table of module
moduleName into our current symbol table, making it possible for us to directly access
identifier, as shown below for the factorial function of the math module:
While this has the advantage that it simplifies the way we use the identifier
(factorial) without using the module name each time to access it, it also has the
disadvantage that it can pollute the current symbol table and increase the number of
identifiers, thereby making it possible for clashes to occur as more identifiers from
more modules join in similarly. Though convenient, this practice of importing identifiers
is frowned upon by professionals.
The question arises: what will happen if we import identifiers with the same name from
different modules? The answer is that each import can potentially overwrite existing
identifiers in the symbol table, thus giving most preference to the last import. This is
illustrated below:
Observation:
1. We first imported the factorial function from the math module and verified
that it worked correctly.
2. We then imported the factorial function from the fact module and
verified that we still get the correct output. The fact that there is a clash in the
identifier name factorial did not result in any error. The fact that both
math.factorial and fact.factorial give us the same output might
make it a little difficult for us to figure out which of the two was invoked.
3. Recollect from section 10.15 that functions can have documentation strings.
We would expect math.factorial to also have it, but our
fact.factorial did not have one. We can use this knowledge to figure out
which module's factorial function was called.
518 15. Modules
4. From the output above, we can see that each import overwrites the previous
factorial with a new copy taken from the specified module.
1. #!/usr/bin/python
2.
3. # Program to find the combination of 2 integers using
modules.
4.
5. from fact import factorial
6.
7. def ncr(n,r):
8. return int(factorial(n)/(factorial(r)*factorial(n-r)))
9.
10. x,y = input("Enter 2 integers:").split(' ')
11. x,y = int(x),int(y)
12. print("The combination of {} and {} is
{}".format(x,y,ncr(x,y)))
Output:
Enter 2 integers:5 3
The combination of 5 and 3 is 10
Observation:
1. This program is based on combination2.py.
2. We have changed line 5 to import only the identifier factorial and make it
directly available to us.
3. Line 8 uses the factorial function directly without using the module name
fact to identify it.
Before we conclude this section, let us introduce other related forms of importing as
well:
15. Modules 519
It is possible to import multiple identifiers into the current symbol table by giving a
tuple of identifiers as shown in the following syntax:
NOTE:
We can import as many identifiers from a module as required. This syntax reduces
the number of import statements required to achieve this goal.
As a special case, we can import all permissible identifiers from a module into the
current symbol table using the following syntax:
NOTE:
While this statement should ideally import all identifiers from the module
moduleName into the current symbol table, the fact is that the module controls
which identifiers are imported when * is used! If the module does not explicitly
specify what all is permissible to be imported when this syntax is used, the default
rule is to import all identifiers that do not have a leading single underscore! Also
note that this syntax of importing is frowned upon by professionals as it pollutes the
importer’s namespace!
1. #!/usr/bin/python
2.
3. # Module to find the factorial of an integer
4. # Demonstration of module execution
5.
6. def factorial(n):
7. if n==0: return 1
8. return n*factorial(n-1)
9.
10. print("Module loaded successfully!")
Output:
Observation:
1. This program is based on fact.py with only 1 addition of line 10.
2. Line 10 prints a message when executed, which we can use as a proof as
execution of statements.
We get the above output when we directly run the Python script as a program. But the
point is that we will continue to get the same output even if this module is imported, as
shown in the below interactive session:
Let us now finally rewrite the previous module so that the output is different for direct
and indirect execution.
fact3.py
1. #!/usr/bin/python
2.
3. # Module to find the factorial of an integer
4. # Demonstration of module execution
5.
6. def factorial(n):
7. if n==0: return 1
8. return n*factorial(n-1)
9.
10. if __name__ == "__main__":
11. # Direct execution
12. print("Script executed successfully!")
13. else:
14. # Indirect execution
15. print("Module loaded successfully!")
Output:
Interactive session:
Observation:
1. Line 10 checks if the execution is direct (__name__ == __main__) or indirect
(__name__ would contain the module name).
2. If the script is directly executed, lines 11-12 will be executed, as can be seen
from the output above.
3. If the script is indirectly executed by importing it, lines 14-15 will be executed,
as can be seen from the interactive session above.
522 15. Modules
dir([moduleName])
Forms:
1. dir()
2. dir(moduleName)
Form #1: dir()
This function returns a list of all the identifiers present in the current local scope of the
current namespace, excluding the built-in identifiers, as shown below:
>>> x=10
>>> def f(x): return x
...
>>> import fact
>>> dir()
['__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'f', 'fact', 'x']
Observation:
1. We have created a variable x, defined a function f and imported a module
fact.
2. We see that the dir() function returns a list of strings with each string
naming an identifier in our current namespace. This list includes names of
variables (like x), names of functions (like f) and names of modules (like
fact).
3. Certain standard identifiers are always included – like __builtins__ for
example!
NOTE:
If you want to see the list of all built-in identifiers, import the module builtins
(import builtins) and pass builtins as an argument to dir() using the 2nd
form (dir(builtins)).
15. Modules 523
NOTE:
The second form actually accepts any namespace, not necessarily a module name!
Given a reference to an object, for instance, this will return a list of all symbols
present within that object!
524 15. Modules
15.4 Packages
If a module is a collection of reusable code, what is a collection of modules called?
Package!
A package is a logical collection of modules (and as will be covered in the next
section, a package can also contain subpackages). A package makes it possible to
logically group related modules so that managing and distributing them becomes
easier. Physically in the file system, a package is a directory and the modules are files
within that directory.
Given that a directory is to be considered to be a package, all the Python scripts
within that will be considered to be modules. We should therefore have a mechanism
to tell Python which directories are to be considered as packages and which are to be
treated as merely directories without any other inference. This is done by the
presence of a special file called __init__.py. The presence of this file in a directory
is an indication to Python that the directory is to be considered to be a package. The
file could well be empty, but being a Python script, it could also contain executable
Python code that is executed when the package is loaded for the first time (just as
how modules are executed when imported for the first time). Similarly, the absence of
this file in a directory will prevent Python from considering that directory to be a
package.
Let us create some sample files to demonstrate packages and accessing it’s modules
and their content. We will create a package called pkg1 that contains 2 modules
named mod1 and mod2 respectively, each containing a function called f. We will then
see various ways of invoking these functions.
pkg1/__init__.py
1.
pkg1/mod1.py
1. #!/usr/bin/python
2.
3. # Demonstration of modules within packages
4. # Module mod1 containing function f
5.
6. def f():
7. print("This is mod1's f")
15. Modules 525
pkg1/mod2.py
1. #!/usr/bin/python
2.
3. # Demonstration of modules within packages
4. # Module mod2 containing function f
5.
6. def f():
7. print("This is mod2's f")
Interactive session:
Observation:
1. The file pkg1/__init__.py is empty. We will see what code could be
added later on.
2. The statement import pkg1 makes the symbol pkg1 available in the
program. This statement will also be responsible for executing the file
pkg1/__init__.py.
3. The statement from pkg1 import mod1, mod2 makes the symbols mod1
and mod2 from within pkg1 available to us directly. This statement is also
responsible for executing the modules mod1 and mod2 within the package
pkg1.
4. The statement mod1.f() invokes the function f of module mod1 of package
pkg1. Similarly, the statement mod2.f() invokes the function f of module
mod2 of package pkg1.
In the above setup, if our goal is to invoke the function f of module mod1 of package
pkg1, these are a couple of ways of doing it:
Observation:
1. Do note that each of the examples given here are in different interactive
sessions, as in a single session all previous imports are remembered!
2. The statement import pkg1.mod1 imports the entire symbol pkg1.mod1. It
is understood from this syntax that pkg1 is the name of a package whereas
mod1 is the name of a module.
3. The statement pkg1.mod1.f() invokes the function f of the imported
symbol pkg1.mod1, which is nothing but the module mod1 within the
package pkg1.
Observation:
1. The statement from pkg1 import mod1 imports the symbol mod1 from
within pkg1, making mod1 directly accessible.
2. The statement mod1.f() then invokes the function f using the module name
that was imported.
Observation:
1. The statement from pkg1.mod1 import f imports the symbol f from the
module mod1 within the package pkg1.
2. The function f can now be directly invoked.
packageName.moduleName
packageName.moduleName.functionName
packageName.moduleName.variableName
15. Modules 527
1. #!/usr/bin/python
2.
3. # Demonstration of packages
4. # Package execution
5.
6. print("Package loaded successfully!")
pkg2/mod1.py
1. #!/usr/bin/python
2.
3. # Demonstration of modules within packages
4. # Module mod1 containing function f
5.
6. def f():
7. print("This is mod1's f")
8.
9. print("Module mod1 loaded successfully!")
pkg2/mod2.py
1. #!/usr/bin/python
2.
3. # Demonstration of modules within packages
4. # Module mod2 containing function f
5.
6. def f():
7. print("This is mod2's f")
8.
9. print("Module mod2 loaded successfully!")
528 15. Modules
Observation:
1. The filenames mod1.py and mod2.py do not clash with what we had done
previously as they are in a different directory pkg2. For the same reason, the
modules mod1 and mod2 of package pkg2 will not clash with the modules of
the same name in the package pkg1.
2. We have added print statements in these 3 files to find out when they
execute.
The same interactive sessions given for the previous package demo is presented here
for this new package:
Observation:
1. When a package (or any of it’s modules) is imported for the first time, the
package gets executed first (and by that we mean that the file __init__.py
within that package gets executed).
2. A package always gets executed before any of it’s modules.
15. Modules 529
15.5 Subpackages
A package can not only contain modules, but can also contain subpackages (which
recursively can contain subpackages and modules). This concept is identical to the
physical concept in a filesystem that a directory can not only contain files, but can also
contain subdirectories (which recursively can contain subdirectories and files).
Thus, some more valid syntaxes that now become possible are:
packageName.subpackageName
packageName.subpackageName.moduleName
packageName.subPackageName.moduleName.functionName
packageName.subPackageName.moduleName.variableName
packageName.subpackageName.subpackageName.moduleName
1. #!/usr/bin/python
2.
3. # Demonstration of packages
4. # Subpackage demo
5.
6. print("Package pkg3 loaded successfully!")
pkg3/subpkg/__init__.py
1. #!/usr/bin/python
2.
3. # Demonstration of packages
4. # Subpackage demo
5.
6. print("Subpackage subpkg loaded successfully!")
530 15. Modules
pkg3/subpkg/mod.py
1. #!/usr/bin/python
2.
3. # Demonstration of modules within subpackages
4. # Module mod containing function f
5.
6. def f():
7. print("This is mod1's f")
8.
9. print("Module mod loaded successfully!")
Interactive session:
Observation:
1. Subpackages are also packages and must contain the file __init__.py.
2. Packages are always executed before their subpackages
Observation:
1. Any package/module imported will be searched through the above directories
in sequential order till it is found. Thus for example, preference is given to the
directory /usr/lib64/python35.zip rather than
/usr/lib64/python3.5. If the package/module is not found in any of
directories listed above in the output, it is an error.
2. The first entry in sys.path is empty and represents the current working
directory. This means that packages and modules in the current working
directory is given more preference than the library modules! What this means
is that we can provide replacements for library packages and modules!
The next question is how is sys.path built? The contents of sys.path are loaded
as follows:
1. The first entry is always the current working directory.
2. The second set of entries are picked up from the environment variable
PYTHONPATH, if present. This variable has the same usage as PATH – it is a
list of directories separated by a delimiter. The delimiter is ‘:’ is UNIX/Linux
and ‘;’ in Windows. This would be a good place to store locations of packages
and modules without affecting the source code.
3. The standard library locations are then appended to sys.path. These are
dependent on the installation.
We can change the contents of sys.path in our script. Modifying sys.path directly
in our script can be a replacement for loading them from PYTHONPATH. Let us remove
the first entry in sys.path (referring to the current directory) and see the impact on
importing our packages/modules stored in the current directory:
532 15. Modules
As can be seen, Python is now unable to import the package pkg1 from our current
directory as the current directory is no longer searched in. Let us start a new session
and add pkg1 to sys.path and see how we can load the module mod1:
Adding pkg1 in sys.path makes Python search for packages and modules within
the directory pkg1 also. This makes it possible to directly load the module mod1 from
within the directory pkg1.
As our last demo in this section, let us add pkg1 to PYTHONPATH and import mod1
from pkg1 directly.
The following statement is being typed in a Linux shell (not in Python interpreter):
export PYTHONPATH="pkg1"
As can be seen, we were able to locate mod1 within pkg1. As further proof that
sys.path was modified with the inputs taken from PYTHONPATH, here is a
continuation of the same session:
While the first entry is for the current working directory, the second entry was taken
from PYTHONPATH. The rest of the entries are as before.
Observation:
1. We are running the Python interpreter (python) explicitly. If there are multiple
Python interpreters installed, we can choose the appropriate one to be used
for package installation.
2. The -m option specifies that we are interested in running a named module
(pip) as a script.
3. The argument install is an argument for the PIP script that denotes that we
wish to install a package.
4. Finally, the packageName argument specifies the name of the package to be
downloaded and installed.
15.8 Questions
1. How are modules helpful in managing large projects?
2. Write a short note on how a module can be created and used in Python.
3. Explain the various ways of importing module contents with examples.
4. How does Python know where to search for modules and packages? Explain
various techniques to help Python locate custom modules.
5. How are packages different from modules?
15.9 Exercises
1. Define a module called prime that contains a function isPrime() that
returns whether the passed argument is prime or not. Using this module and
function, write another program containing a function printPrimes() that
prints the first n prime numbers.
2. Define a module called factorial that contains a function to find the
factorial of the given integer. Using this function, find the permutation and
combination of the given inputs.
3. Create a package P with module M that contains a function F. Demonstrate
various ways of calling the function F using different programs.
15. Modules 535
SUMMARY
➢ Modules get executed when imported for the first time! The
special symbol __name__ will contain the module name when
executed due to an import, and will have the value “__main__”
instead when being executed as a script.
➢ The dir() function can be used to list out any namespace content.
The next few sections will deal with DDL, DML and DQL in brief. We will not be
considering DCL as access rights is not a critical issue when simple databases are
created using SQLite.
Observation:
1. SQL is not case sensitive and therefore we can use any case we wish, but we
will stick to upper-case for keywords and lower-case for identifier names as a
convention. This convention is followed by many.
2. This Employee table will have 3 columns or fields: name, id and email.
3. TEXT and INTEGER are standard data types. TEXT can help represent any
textual content of varying length whereas INTEGER can help represent any
normal integer.
It is possible to create a table conditionally only if it does not already exist. This uses a
slightly different syntax:
NOTE:
Dropping a table will not only remove the table, but also all data along with it! This
is potentially dangerous to be executed accidentally!
It is possible to drop a table conditionally only if it exists. Any attempt to drop a table
that does not exist will otherwise give rise to a syntax error. Conditional dropping of a
table is demonstrated below:
The above statement will replace the email address of any record(s) to
“[email protected]” whenever it finds that the id in that record is 1.
540 16. Working with Databases
As a special case, in order to delete all records of the table, the following DELETE
statement can be used:
NOTE:
This is not the same as DROP TABLE Employee, which not only deletes the rows
but also removes the table!
The above statement fetches all records from the table Employee. If we wish to filter
the results, we could do so as shown in the following example:
might rarely want to create large databases and store huge amounts of data). The
installation process is simple and direct and so are the databases. Each database is
stored locally as a file. In fact, if required, a database can be temporarily created
directly in memory! This of course will have the limitation that when the application
terminates, the database vanishes too!
16.2 Installation
SQLite comes built-in in Python 3 and hence requires no installation. More information
can be obtained on downloading the source and documentation at
https://www.sqlite.org/.
sqlite3.connect(databaseName)
Recall from section 16.1.5 that SQLite stores databases as files. The databaseName
parameter shown in the syntax above is the filename (or pathname if the file is not in
the current directory) of the database file to be used. If the file does not exist, it will be
created.
NOTE:
As a special case, if the databaseName parameter is “:memory:”, the database is
created in memory!
The connect() function returns a Connection object that will be used further in
section 16.4, but for now let us also see how to close a database connection using the
close() method, whose syntax is shown below:
connection.close()
From the time a database connection is opened till the time it is closed, there could be
any number of read/write operations on the database. None of the write operations
would be committed to the database without an explicit call to the commit() method!
It might be a good idea to follow the practice of committing before closing the
database connection!
542 16. Working with Databases
connection.commit()
connection.cursor()
We can create any number of Cursor objects from a single connection and using
each we can execute any number of queries through the connection. While it is not
necessary to create multiple Cursor objects to execute multiple queries, do bear in
mind that if we are not done extracting the complete result of the previous query and
execute the next query, we will lose the remaining results from the previous query.
There are therefore situations wherein we may have to create multiple Cursor
objects.
Retrieving results from Cursor objects will be dealt with later in section 16.5, but for
now let us see how to execute queries that don’t produce results (DML queries).
cursor.execute(query)
16. Working with Databases 543
We will first demonstrate how to create the Employee table discussed in section
16.1.2.1 using this method:
1. We have framed the query only once without hard-coding the values, and
used it twice with different values.
2. The execute() method now expects a second argument – a tuple of values
– to be assigned on a one-to-one basis for each placeholder found in the
query.
3. In case you are wondering that String.format() could also have been
used instead, here’s the difference: using placeholders is far safer as it
protects you from SQL injection! Without going into the details of that, here’s
the short conclusion: it is not safe to use String.format() as a
replacement for the placeholders concept!
When using placeholders, we use a slightly different syntax of execute() as shown
below:
cursor.execute(query_with_placeholders, tuple_of_values)
cursor.executemany(query_with_placeholders, seq_of_tuples)
Here’s a demonstration of adding 2 more records into our Employee database using
this concept:
Observation:
1. We have used the same query template as before
2. We have framed a list of tuples, with each tuple representing 1 Employee
record. This need not be a list – it can be any sequence, for example another
tuple.
3. We have passed this sequence as an argument to executemany() the
same way we pass a single tuple as an argument to execute().
4. We now should be having totally 5 records in our database if we have
executed all the above code snippets. The employee names in our database
would be Ram, Sham, Balram, Sita and Gita.
5. So far, we haven’t been able to verify whether our data in indeed present in
the database. The next section will demonstrate how we can extract data from
the database and verify the working of all the code we executed so far.
Observation:
1. We have connected to the same database – employee.db – and are trying
to list out all records in the database.
2. Each row, when printed, is printed as a tuple of column values.
3. We iterate row by row till the fetchone() method returns None.
4. Since we have not written anything into the database, there is no need to
invoke commit().
Observation:
1. This is a continuation of the previous session and hence we don’t have to re-
establish connection with the database or create a new Cursor object.
2. We need to execute the query again so as to obtain the results. We can either
reuse the same Cursor object as has been done, or can create a new
Cursor object for this (which is unnecessary).
3. We are merely iterating over the Cursor object and end up iterating over the
result rows. This is perhaps more readable and simple than using
fetchone().
Cursor.fetchmany([size])
Observation:
1. The optional size argument specifies the number of result rows we wish to
fetch. In case those many rows are unavailable, it will anyway return how
many ever rows are available.
2. It is best to omit the size parameter and allow the Cursor object to decide
the best size. This is guaranteed to be optimal.
Let us continue the Python session and fetch all result rows using fetchmany() this
time:
Observation:
1. Again, this is a continuation of the previous session and we have executed
the query again using the same Cursor object.
2. We are attempting to read 3 rows at a time. We break out of the loop when
the number of rows returned in 0.
3. Since there were 5 rows in the database, we see that the first iteration
produced the first 3 rows, the second iteration produced the remainder 2 rows
and the third iteration produced 0 rows (and that’s when we break out of the
loop).
Observation:
1. As we can see, all the result rows have been returned at one go. In the
special case that there were no result rows at all, we would have obtained an
empty list.
In order to deal with rows as Row objects instead of tuples, the following statement
must be executed before we attempt to create the Cursor object from the
Connection object:
>>> conn.row_factory=sqlite3.Row
16.7 Questions
1. Explain how SQL queries can be executed in Python using SQLite.
2. Explain the usage of placeholders in SQL query execution using examples.
3. Write a short note on extracting SQL query results as rows in Python.
4. Write a short note on extracting column values from within an SQLite Row
object.
16. Working with Databases 551
16.8 Exercises
1. Write a program to create an SQLite database in the file employee.db that
contains a table called Employee, with fields ID, name, department,
designation and city.
2. Write a program that allows the user to add multiple records into the
employee.db file created earlier. After every record, the user should be
asked whether he/she wants to add another record.
3. Write a program that allows the user to edit an entry present in the
employee.db file created earlier. The program should ask for the ID of the
employee, and should take in the new details for that employee.
4. Write a program that allows the user to delete 1 or more records from the file
employee.db created earlier. The input is a single line containing the ID of
the employees to be deleted, separated by spaces.
5. Write a program that displays all the records present in the file employee.db
created earlier in a formatted manner.
6. Write a menu-driven program that works with the employee.db file created
earlier and provides the following options:
1. Searching for a particular employee by ID
2. Listing all employees belonging to a particular department
3. Listing all employees belonging to a particular city
4. Listing the number of people in a particular city having a particular
designation
5. Listing the total number of employees in each city
552 16. Working with Databases
SUMMARY
➢ We can also deal with query results using Row objects instead of
the default tuples. Row objects can behave both as tuples and as
dictionaries.
17. Parsing HTML 553
17 PARSING HTML
Search for tags, attributes and content within a HTML parse tree
PARSING HTML
17.1 Introduction
HyperText Markup Language (HTML) is the language of the web. Whenever content
has to be made available across the web, the most preferred form is through HTML
output. This chapter concentrates on how HTML can be parsed in Python and how it
can be created.
Python 3 comes with a built-in HTML parser. We will first explore how to parse HTML
content using this. Thereafter, we will explore another library called BeautifulSoup
(which internally uses Python’s HTML parser) that provides more powerful and easier
to use interface to HTML documents.
While the first parameter has to be self (as this is an instance method), the second
parameter (tag) identifies the name of the tag found and the third parameter (attrs)
is a list containing all the attributes present with the tag, with each attribute-value pair
being stored as a tuple.
Here is a sample code to demonstrate how this works:
Observation:
1. We imported the HTMLParser class from the html.parser module, defined
our class (Parser) that derived from it, overrode the method
handle_starttag and instantiated our class.
2. We used the feed() method to feed in HTML data. We will explore this in
detail later in section 17.2.2.
3. We are merely printing the tag name and it’s attribute as and when the parser
encounters it.
4. We observe that parsing commences immediately after feeding in data.
5. We observe that we encounter the tags in the same order as they are present
556 17. Parsing HTML
HTMLParser.handle_endtag(self, tag)
Note that the end tag cannot contain any attributes and therefore we don’t have the
additional attrs parameter that we had in handle_starttag.
We will now demonstrate the handling of end tags in addition to start tags. Since we
are reusing the code written earlier, it would be easier for us to write this as a program
instead:
htmlDemo1.py
1. #!/usr/bin/python
2.
3. # Program to demonstrate handling of start tags
4. # and end tags in HTML content
5.
6.
7. from html.parser import HTMLParser
8.
9. class Parser(HTMLParser):
10. def handle_starttag(self, tag, attrs):
11. print("Found tag:", tag)
12. print(" Attributes:", attrs)
13.
14. def handle_endtag(self, tag):
15. print("End tag:", tag)
16.
17. data='<html><body><p class="para text" id="1">This is a
<b>test</b><p><br/></body></html>'
18. parser=Parser()
19. parser.feed(data)
17. Parsing HTML 557
Output:
Observation:
1. In addition to handle_starttag, we have now defined even
handle_endtag.
2. The start tags and end tags are printed in the order they occur in the HTML
content.
Unpaired tags of the form <br/> are reported in the handle_startendtag method
that has the following syntax:
NOTE:
If we do not override handle_startendtag, the default implementation of this in
HTMLParser class is invoked, which ends up calling both handle_starttag and
handle_endtag for this tag!
558 17. Parsing HTML
1. #!/usr/bin/python
2.
3. # Program to demonstrate handling of start tags
4. # and end tags in HTML content
5.
6.
7. from html.parser import HTMLParser
8.
9. class Parser(HTMLParser):
10. def handle_starttag(self, tag, attrs):
11. print("Found tag:", tag)
12. print(" Attributes:", attrs)
13.
14. def handle_endtag(self, tag):
15. print("End tag:", tag)
16.
17. def handle_startendtag(self, tag, attrs):
18. print("Found unpaired tag:", tag)
19. print(" Attributes:", attrs)
20.
21.
22. data='<html><body><p class="para text" id="1">This is a
<b>test</b><p><br/></body></html>'
23. parser=Parser()
24. parser.feed(data)
Output:
Observation:
1. This program is pretty much like the previous one, except for the addition of
the handle_startendtag method in line 17.
2. From the output, it is now evident that the HTML content <br/> did not
invoke handle_starttag and handle_endtag, but instead only invoked
handle_startendtag.
HTMLParser.handle_data(self, data)
The actual text content obtained is sent via the data parameter.
Let us rewrite our program to also handle text:
htmlDemo3.py
1. #!/usr/bin/python
2.
3. # Program to demonstrate handling of tags
4. # and text content in HTML content
5.
6.
7. from html.parser import HTMLParser
8.
9. class Parser(HTMLParser):
10. def handle_starttag(self, tag, attrs):
11. print("Found tag:", tag)
12. print(" Attributes:", attrs)
13.
14. def handle_endtag(self, tag):
15. print("End tag:", tag)
16.
17. def handle_startendtag(self, tag, attrs):
18. print("Found unpaired tag:", tag)
19. print(" Attributes:", attrs)
20.
21. def handle_data(self, data):
22. print("Found data:", data)
23.
24. data='<html><body><p class="para text" id="1">This is a
<b>test</b><p><br/></body></html>'
560 17. Parsing HTML
25. parser=Parser()
26. parser.feed(data)
Output:
Observation:
1. We have added the handle_data method in this program in line 21.
2. We see that even the textual content is being reported now.
There are additional methods to handle these and their syntax is shown below:
HTMLParser.handle_comment(self, commentText)
HTMLParser.handle_entityref(self, entityName)
HTMLParser.handle_charref(self, entityCode)
HTMLParser.handle_decl(self, declarationText)
HTMLPartser.unknown_decl(self, declarationText)
HTMLParser.handle_pi(self, processingInstruction)
HTMLParser.feed(self, data)
This method feeds in the given data to the parser and that results in an immediate
execution of suitable methods. Data can be “added” to the parser by calling this
method several times. Technically, data is buffered internally since this method can be
called any time with more content. This means that text content can remain in the
buffer, unreported via any of the methods discussed. The buffer is guaranteed to be
flushed out when we invoke the close() method on the parser object:
HTMLParser.close(self)
BeautifulSoup(htmlString [,parser])
The parser argument can be either “lxml” to select the lxml HTML parser or
“html.parser” to select the HTMLParser covered in section 17.2 before. If omitted,
it will automatically pick a parser from amongst the available parsers in the system.
17. Parsing HTML 563
Here is a demonstration:
BeautifulSoup(YOUR_MARKUP})
to this:
BeautifulSoup(YOUR_MARKUP, "lxml")
markup_type=markup_type))
>>> soup
<html><body><p class="para text" id="1">This is a
<b>test</b></p><p><br/></p></body></html>
Observation:
1. We imported the BeautifulSoup class from the bs4 module (bs4 stands
for BeautifulSoup 4)
2. We used the same data that was used earlier in section 17.2.1.1.
3. When we constructed the BeautifulSoup object, we got a warning because
we had not explicitly selected the backend HTML parser. We see from the
message that it has selected the lxml parser in this case because it was
already installed and available. It also displays the message that if we
explicitly pass the parser (“lxml” or “html.parser”) then we won’t get this
message. This is not a grave issue and we can ignore it for the moment.
4. We see that the constructed object, soup, does indeed have the HTML
content we had passed for parsing.
564 17. Parsing HTML
For larger HTML content, we can store the contents in a file and pass on the file
handle to the constructor as shown in the syntax below:
BeautifulSoup(fileHandle [,parser])
Observation:
1. We open the file “test.html” and pass the file handle to BeautifulSoup.
2. If the file “test.html” is in a different directory, we can always specify the
pathname of the file.
3. We will see in later sections how to extract various parts of this HTML
document.
It is also possible to prettify this output using the prettify method as follows:
The prettify method formats the HTML content and returns it as a string. The
complete syntax is given below:
BeautifulSoup.prettify([formatter=formatter])
The formatter controls how the HTML is prettified. The default value for formatter is
“minimal”.
In order to demonstrate all these capabilities, we will use a sample HTML content like
this:
document
html
head body
div
p p p
17. Parsing HTML 567
Observation:
1. This tree is a collection of nodes. Not all nodes are shown here for readability.
We have chosen to merely show all the tag objects in the tree.
2. What is specifically missing in this tree is all the text nodes. Text nodes are
not tag objects, but are nodes in the tree nevertheless. Though not shown in
the tree, we emphasize that they exist and will show in the following sections
how to extract them as well.
3. The root of this hierarchy is the “document” object, maintained internally by
BeautifulSoup.
4. The single child of the “document” object is the html tag, which is the root of
the HTML content.
5. The html tag contains a head tag and body tag as children.
6. The body tag contains a div tag as it’s child, which in turn contains 3 p tags.
All further demonstration code will be continuation of the same session. We know the
an HTML page generally contains a title tag. Let us extract that tag:
Observation:
1. The expression soup.title, where soup refers to the BeautifulSoup object
that has parsed through some HTML content, returns a reference to a Tag
object that represents the title tag in the HTML content.
2. We see that the string representation of a Tag object is the tag along with it’s
content – in this example, “<title>HTML Demonstration</title>”
3. We can verify that the type of the Tag object is indeed bs4.element.Tag.
568 17. Parsing HTML
Given a Tag object, as we had extracted above, we can obtain the following
information:
1. The name of the tag
2. The attributes of the tag
3. The string content of the tag
4. The child tags within the tag
5. The complete recursive string content within the tag
Observation:
1. The div tag had 2 attributes: id and class.
2. While the id attribute had a single value, the class attribute had 2 values. In
HTML, multiple values for an attribute are space-separated. BeautifulSoup
gives us a list of values instead!
3. Do note however that BeautifulSoup will give us a list of values only if and
when it feels that multiple values are supported for that attribute and at least
one space is found in the value. If multiple values are not expected for that
attribute, BeautifulSoup will retain the space in the value instead and does not
return a list!
17. Parsing HTML 569
An easier and more direct way of interacting with known attributes of a tag is by
simply using the attribute name as a subscript on the Tag object as demonstrated
below:
Observation:
1. The above approach of course only works when we know the name of the
attribute we are expecting.
2. While tag['class'] gives us the value of the attribute class in the form of a
list, tag['class'][0] gives us the first value for that the attribute class.
Observation:
1. The string attribute provides the text content within a Tag. The type of the
text content is NavigableString.
2. This text content is available only if the Tag contains nothing else apart from
text! Section 17.3.4.5 shows how to extract text from within the child nodes
recursively.
3. The string equivalent of a NavigableString object is the text content.
570 17. Parsing HTML
If the given Tag contains multiple child text nodes, then we can iterate over all the
NavigableString nodes of the given Tag using the strings attribute as follows:
Observation:
1. We are iterating over the text content of the div Tag.
2. We see that this iterates not only within the div Tag, but also it’s children!
3. We display the string equivalent of each child, enclosed in square brackets for
readability.
4. We need to explicitly convert each object to a string using the str() function
since strings cannot be concatenated against other objects. We never found
the need to convert these objects to strings earlier as in all those examples,
the conversion to string was implicit!
Each newline character after the HTML tag is being considered text content and this
makes the output less readable. We can instead choose stripped_strings over
strings to iterate over strings that have these whitespaces stripped:
17. Parsing HTML 571
Observation:
1. We no longer display the square brackets as there is no need!
2. Since we are anyway not performing string concatenation, we don’t need to
explicitly convert the object to a string.
3. We observe that we don’t obtain the blank lines we had obtained earlier.
Even comments are considered to be (special kind of) strings! They belong to the type
Comment, which is a subclass of PreformattedString, which is a subclass of
NavigableString, as demonstrated below:
Observation:
1. We create a Comment tag by parsing through a HTML comment.
2. We observe that the string equivalent of the comment is the comment text.
3. The type of a comment object is Comment, which is derived from
PreformattedString, which is derived from NavigableString.
572 17. Parsing HTML
Observation:
1. We are trying to extract the body tag from within the html tag.
2. We have used tag.name to demonstrate the correctness here.
We need not, strictly speaking, use the proper hierarchy to locate the body tag. Thus,
the following code will also work:
Observation:
1. These techniques for extracting a child tag work only when there is a single
child tag with the given name. If there are multiple tags with the given name,
only the first one is returned.
2. Section 17.3.6 will cover how to extract all tags with a given name.
17. Parsing HTML 573
This is Bold
This is Italics
This is Underlined
>>>
Observation:
1. We have extracted the div tag.
2. We see that the div tag contains 3 p tags inside it, which in turn contain text.
In addition, there are newlines.
3. The get_text() method has concatenated and returned the strings within
each p tag inside the div tag.
The following sections will cover all these possibilities using the following attributes of
the Tag class:
Tag.contents
Tag.children
Tag.descendants
Tag.parent
Tag.parents
Tag.next_sibling
Tag.previous_sibling
Tag.next_siblings
Tag.previous_siblings
Tag.next_element
Tag.previous_element
Tag.next_elements
Tag.previous_elements
Observation:
1. The child nodes of the div Tag includes string nodes and other Tag nodes.
2. The string nodes merely contain the text that was present within the Tag
whereas the Tag nodes can contain additional content within them.
3. Though the output looks like a list of string objects, in reality it is a list of
NavigableString and Tag objects
4. The order of the elements in the list is the same as the order in which they
appear in the HTML content parsed.
17. Parsing HTML 575
If the goal is to iterate over the children, the children attribute can be used instead,
which will prove more efficient:
Observation:
1. We iterate over the children of the given Tab object.
2. This code that uses children is slightly more efficient than a similar code
written using contents, but the output is the same. We are avoiding the
creation of a list of child elements here.
While the previous example showed how to visit the immediate children using the
children attribute, it is also possible to similarly visit all the descendants (children
recursively) using the descendants attribute as shown below:
[This is ]
[<i>Italics</i>]
[Italics]
[
]
[<p>This is <u>Underlined</u></p>]
[This is ]
[<u>Underlined</u>]
[Underlined]
[
]
Observation:
1. This not only shows the immediate children, but their children too!
2. Nested tags are shown as single entries.
Observation:
1. The parent of the div Tag is the body Tag.
2. The parent of the body Tag is the html Tag.
3. The parent of the html Tag is the document element.
4. The parent of the document element is None (not demonstrated here).
It is possible to iterate over the ancestors of a Tag object using the parents attribute:
17. Parsing HTML 577
Observation:
1. We see the ancestors of the div Tag in order: body, html and document.
Observation:
1. We are starting with the first p tag within the div tag.
2. It’s next sibling is a text node with content “\n”.
3. The sibling after that is the next p tag.
4. The previous sibling to the first p tag is a text node with content “\n” - this is
the newline after the <div> tag!
Once again, we can choose to iterate over the siblings using next_siblings and
previous_siblings:
578 17. Parsing HTML
<p>This is <i>Italics</i></p>
<p>This is <u>Underlined</u></p>
>>>
Observation:
1. We start with the first p tag and iterate over the siblings ahead.
2. This gives is all the text nodes and other p tags as well.
3. The blank lines we see is because of the text nodes that contain “\n” as their
content.
Finally, just as how we accessed the next and previous siblings, we can also access
the previous and next elements (nodes) of the parse tree using next_element and
previous_element:
Observation:
1. The next element in the parse tree need not be the next sibling, as can be
seen in the output above.
2. The previous element in the parse tree need not be the previous sibling (but is
indeed the previous sibling in the output above).
We can through all the elements following a particular one using next_elements till
the end of the parse tree (the last leaf node), and can iterate over all previous
elements till the document element using previous_elements. Here is a
17. Parsing HTML 579
demonstration of next_elements:
<p>This is <i>Italics</i></p>
This is
<i>Italics</i>
Italics
<p>This is <u>Underlined</u></p>
This is
<u>Underlined</u>
Underlined
>>>
Observation:
1. We see that we get all the HTML content that appeared after the p tag.
2. The blank lines we see are the text content (newlines) that exist after every
tag in the HTML content.
580 17. Parsing HTML
<body>
<div class="main content" id="main">
<p>This is <b>Bold</b></p>
<p>This is <i>Italics</i></p>
<p>This is <u>Underlined</u></p>
</div>
</body>
HTML Demonstration
<title>HTML Demonstration</title>
<head>
<title>HTML Demonstration</title>
</head>
<html>
<head>
<title>HTML Demonstration</title>
</head>
<body>
<div class="main content" id="main">
<p>This is <b>Bold</b></p>
<p>This is <i>Italics</i></p>
<p>This is <u>Underlined</u></p>
</div>
</body>
17. Parsing HTML 581
</html>
Test file
html
Observation:
1. We see all the HTML content before the p tag, from the div tag all the way till
the document tag!
2. Note that when a previous element is accessed and printed, it will end up
printing all the HTML content contained within it and it’s children recursively.
NOTE:
All the arguments shown above are optional! We frequently use keyword
arguments to specify required arguments.
Instead of invoking this method on the BeautifulSoup object and searching through
the entire parse tree, it is also possible to invoke this method on any Tag object
and search through that sub-tree.
582 17. Parsing HTML
Forms:
1. BeautifulSoup.find_all()
2. BeautifulSoup.find_all(name)
3. BeautifulSoup.find_all(name, attrs)
4. BeautifulSoup.find_all(name, attrs, recursive)
5. BeautifulSoup.find_all(name, attrs, recursive,
string)
6. BeautifulSoup.find_all(name, attrs, recursive,
string, limit)
7. BeautifulSoup.find_all(name, attrs, recursive,
string, limit, **kwargs)
Observation:
1. We are iterating through all the tags of the HTML content and displaying the
name of the tag.
2. We observe that we get all the tags in the same order as present in the HTML
content.
17. Parsing HTML 583
Observation:
1. This lists out all the p tags in the HTML content.
2. We had 3 such p tags and each is listed.
Instead of searching for a single tag with the specified name, we can search for tags
that match any of a given list of names as demonstrated below:
Observation:
1. This lists out all p, b and u tags.
2. We observe that this lists out the tags in the same order as they are found in
the HTML content.
3. We observe that this does not print the i tag as that was not specified in the
list of tags.
584 17. Parsing HTML
It is also possible to use a regular expression for specifying the tags to match, as
demonstrated below:
Observation:
1. We have imported the re module in order to deal with regular expressions.
2. The regex we have given is one that matches a single character – we are
thus searching for tags that are single characters.
3. We compile the regular expression and pass the regex object as the first
argument to the find_all() method.
4. We observe that this lists out all those tags that are made up of single
characters!
Finally, it is possible to pass a function object as the first argument, where the function
is used for filtering as demonstrated below:
Observation:
1. We are passing a function name without the parentheses (i.e. a function
object) as the first argument to the find_all() method.
2. This function (filter_tag) will receive each tag found in the HTML content
and has to return True or False for each tag received. If the function returns
True, the tag will be listed by find_all(), else it will be skipped.
3. The filtering condition we have given in the filter_tag function is that it will
return true only when the given tag is either b or u.
4. We therefore observe that only b and u tags are listed.
Observation:
1. We have passed only the attrs argument in the above example and have
not passed the name argument. From the previous forms, we know that when
the name is not provided, this will give a list of all the tags encountered in the
HTML content. However, we are filtering by attributes now. Only those tags
that have an “id” attribute with the value “main” will be selected.
2. Verify from the HTML content assigned in section 17.3.4 that the div tag is
the only one that has an “id” attribute with value “main”, and hence gets
processed.
586 17. Parsing HTML
Observation:
1. We are trying to find all the “b” tags within the “div” tag. In the first case, the
div tag contains a p tag that in turn contains a b tag, and therefore we see it
being listed.
2. In the second case where we pass recursive=False, we are confining the
search to the immediate children of the div tag and hence do not get to see
the b tag.
Observation:
1. We are searching for any node that contains the text “Bold”.
2. Note that in previous versions of BeautifulSoup, the string argument was
called text.
17. Parsing HTML 587
Observation:
1. The first example attempts to list out all the p tags within the div tag – there
are 3 of them.
2. In the second example, we use limit=2 to limit the tag list to maximum 2
entries and observe that only the first 2 p tags within the div tag are reported.
Observation:
1. We are recursively searching for all those tags that contain an attribute id
with value main.
2. The div tag is the only one that contains such an attribute and is hence
listed.
The find_all() method is perhaps the single method we need to perform any
search in the parse tree. Despite this, there are other possibilities when it comes to
searching. Let us have a quick look at these before we conclude this section.
As a short-cut for the find_all() method, we can directly use parentheses on the
BeautifulSoup document or Tag object, as shown below:
Observation:
1. We see that the output is the same in both the above examples.
2. Some people find the parentheses syntax simpler as it is briefer; others prefer
the find_all() method for readability. The choice, of course, is left to you
since there is no difference in their behaviour or efficiency.
If we are only interested in searching one tag that matches a particular requirement,
we can either use limit=1 as shown in form #6, or can use the find() method
instead, which is identical to the find_all() method except for these differences:
1. The find() method returns a single tag that matched (and the tag would be
the first match found)
2. Even when there is a single match, the find_all() method returns a list
containing that single tag whereas the find() method never returns a list.
17. Parsing HTML 589
The syntax of the find() method is the same as the find_all() method:
Observation:
1. The urllib package comprises of multiple modules like request, error,
parse and robotparser. We are interested in the request module here.
2. We use the urlopen() function to establish a connection with the Wikipedia
page for Python!
3. We obtain a HTTPResponse object in return that is stored in the variable
response in the example.
This was pretty simple but we still haven’t managed to extract any content from the
web page. We can read from the response object the same way as how we read from
files (covered in section 14.3).
Observation:
1. The output of these statements is not shown here as it is large, does not add
value and can change with changes to the Wikipedia page! Nevertheless,
when you run these statements you will definitely see the HTML page content.
2. We are iterating through each line of the HTML page.
3. Each line is obtained as a bytes object. It is better to convert the same into a
str object using the decode() method. The default character set used for
this conversion is UTF-8.
4. The strip() method removes any excess whitespaces at the beginning and
end of each line, giving rise to a cleaner output.
5. If we attempt to repeat this loop once again, we will not get the same output
again as all data from the response object would have already been read
out by this loop.
Observation:
1. The read() method returns the entire HTML data. The output of these
statements is not shown here as it is large, does not add value and can
change with changes to the Wikipedia page! Nevertheless, when you print the
value of data, you will definitely see the HTML page content.
2. We observe that the length of the data loaded is 355637 bytes.
592 17. Parsing HTML
Reading a line at a time might not be very time efficient (though convenient) and
reading the entire content at once may not be very memory efficient. There is a third
possibility: reading data a block at a time using the read() method as shown below:
Observation:
1. Again, remember that we need to use urlopen() calls as once we have
read out the content from the response object, we won’t be able to extract it
again using the same object.
2. We are reading the content 10000 bytes at a time. This is an arbitrary number
that can be controlled by us as required.
Here is a program that accepts a URL and a filename from the user, downloads the
HTML content from the URL and saves it to the given file:
htmlDownloader.py
1. #!/usr/bin/python
2.
3. # Program to download HTML content from a URL
4.
5. import urllib.request
6.
7. url = input("Enter URL:")
8. filename = input("Enter filename:")
9.
10. try:
11. response = urllib.request.urlopen(url)
12. fileHandle = open(filename,"w")
13. fileHandle.write(response.read().decode().strip())
14. fileHandler.close()
15. except Exception as e:
16. print("Error:",e)
Output:
17. Parsing HTML 593
Enter
URL:https://en.wikipedia.org/wiki/Python_(programming_language)
Enter filename:Python.wiki
Observation:
1. We accept the URL to read from and the filename to write to in lines 7-8.
2. We open the connection to the URL and obtain the response in line 11.
3. We open the output file in write mode in line 12.
4. We read the HTML content, convert it to string from bytes, remove leading
and trailing whitespaces and write the content to the file in line 13.
5. We have added exception handling logic to handle any errors during the
process.
1. #!/usr/bin/python
2.
3. # Program to list all hyperlinks referenced from a URL
4.
5. import urllib.request
6. from bs4 import BeautifulSoup
7.
8. url = input("Enter URL:")
9.
10. try:
11. response = urllib.request.urlopen(url)
12. soup = BeautifulSoup(response)
13. hyperlinks = set()
14.
15. for tag in soup("a"):
16. if "href" in tag.attrs:
17. hyperlinks.add(tag["href"])
18.
19. for hyperlink in hyperlinks:
20. print(hyperlink)
21.
22. except Exception as e:
23. print("Error:",e)
594 17. Parsing HTML
Observation:
1. You can enter any valid URL – perhaps you would like to try the URL that we
gave as input to the previous program:
https://en.wikipedia.org/wiki/Python_(programming_languag
e). We have not produced the output here as it can be too large.
2. We accept the URL from the user in line 8.
3. We obtain the contents from that URL in line 11 and construct a BeautifulSoup
object based on that in line 12.
4. Since hyperlinks can be repeated in the HTML content, we construct a set of
hyperlinks in line 13. Remember that sets contain unique elements and
duplicates are ignored.
5. We iterate through each of the a tags in line 15.
6. Not all a tags will have the href attribute (some might just have the id
attribute). We therefore check if the given a tag contains an href attribute in
line 16, and if so, we add it to the set of hyperlinks in line 17.
7. We then iterate through the set of hyperlinks in line 19 and print them in line
20.
8. We process all exceptions by printing them in line 23.
1. #!/usr/bin/python
2.
3. # Program to download an image file from a URL
4.
5. import urllib.request
6.
7. url = input("Enter URL:")
8.
9. try:
10. response = urllib.request.urlopen(url)
11.
12. index = url.rfind("/")
13. filename = url[index+1:]
14. fileHandle = open(filename, "wb")
15. fileHandle.write(response.read())
17. Parsing HTML 595
16. fileHandle.close()
17.
18. except Exception as e:
19. print("Error:",e)
Observation:
1. This program asks the user to enter the URL of the image to download.
Perhaps you can try with the URL of the Python logo image in Wikipedia’s
site:
https://upload.wikimedia.org/wikipedia/commons/thumb/f/f8
/Python_logo_and_wordmark.svg/200px-
Python_logo_and_wordmark.svg.png
2. After taking the input from the user in line 7, we obtain a connection in line 10.
3. We wish to create a local file to store the image. We would prefer to have a
file with the same name as the image file being downloaded, but without the
leading host and path details. We therefore extract only the part after the last
“/” character in the URL. This is done in lines 12 and 13.
4. We open the local file in line 14 for writing in binary mode. The binary mode is
important here as image data is not textual.
5. We read the image data from the URL HTTP response in line 15 and write it
as it is to the local file created.
6. The file is closed in line 16. Exceptions are handled and reported as a text
message in line 19.
17.5 Questions
1. Explain how a simple HTML parser can be built in Python using the
HTMLParser class.
2. List out some of the advantages of using the BeautifulSoup library over
HTMLParser.
3. Explain the services provided by the Tag class of BeautifulSoup.
4. Write a short note on navigation through the parse tree using
BeautifulSoup.
5. Write a short note on searching in the parse tree using BeautifulSoup.
6. Explain how HTML files can be downloaded and parsed in Python with an
example.
7. Explain how an image can be downloaded over the Internet with an example.
596 17. Parsing HTML
17.6 Exercises
1. Write a Python script to load an HTML file from the filesystem and convert the
BODY contents into text by discarding all tags.
2. Rewrite the above program to download HTML content from a given URL,
printing the textual content in the BODY.
3. Write a Python script to extract all important keywords within a HTML
document (assuming that such keywords are within B, STRONG, I or EM tags).
4. Write a Python script that downloads all images referenced within an HTML
page given it’s URL.
5. Write a Python script that parses through an HTML file and lists out all the
unique CSS class references.
6. Write a Python script that parses through an HTML file and identifies the DIV
Tag that contains the most number of descendent tags.
17. Parsing HTML 597
SUMMARY
➢ The Tag object has various attributes of use like name, attrs,
string, etc.
18 PARSING XML
PARSING XML
18.1 Introduction
XML (eXtensible Markup Language) is the mother of many languages (including
HTML covered in section 17) used for representation of data. This chapter shows how
XML data can be parsed in Python.
Python 3 comes with a built-in XML parser called ElementTree to parse XML
documents, but BeautifulSoup covered in section 17.3 can also be used if the lxml
HTML/XML parser is installed. This chapter focusses on how to use the
ElementTree XML API.
Observation:
1. All valid and well-formed HTML documents are also valid XML documents.
2. We import the ElementTree class from the xml.etree module.
3. The ElementTree.parse function can parse through XML content from a
file, given the pathname of the file or a file object. It returns an ElementTree
object that represents the parsed XML document.
4. Note that a well-formed and valid HTML document is a form of XML
document! We are merely reusing examples discussed earlier in chapter ***.
5. We extract the root node of this parse tree using the getroot() method of
600 18. Parsing XML
ElementTree, which returns the root node of the XML parse tree as an
Element object.
6. The root node of the XML parse tree is the html tag.
Observation:
1. Instead of using the parse() function, we are now using the fromstring()
function and passing the XML string instead.
2. Once again, remember that HTML content can also be considered to be valid
XML content if it is well-formed and valid.
3. Unlike the parse() function that returns an ElementTree object from which
we can extract the root node, the fromstring() method directly returns the
root node!
4. In this example, once again, the root node is the html tag.
18. Parsing XML 601
Observation:
1. The root node is the html tag.
2. The expression root[0] would have given us the head tag. We have used
root[1] which gave us the body tag.
3. The expression root[1][0] gives us the first child of root[1], where
root[1] represents the body tag. This first child of the body tag is the div
tag.
Observation:
1. The expression root[1][0] gives us an Element object that represents the
div tag.
2. The div tag has 3 children and all of them are p tags.
Observation:
1. The previous section showed us that root[1][0] gives us an Element
object representing the div tag.
2. The attrib attribute is a dictionary containing the attributes of the div tag
and their corresponding values.
We can also access an individual attribute’s value from an Element object using it’s
get() method as shown below:
Observation:
1. The expression root[1][0] gives us an Element object representing the
div tag.
2. The div tag has an attribute called id whose value is main.
Observation:
1. The expression root[0] gives us an Element representing the head tag.
2. The expression root[0][0] gives us an Element representing the title
tag.
3. The text attribute of this Element object gives us the string content within
that tag.
604 18. Parsing XML
Let’s take another example – this time we will extract the text content of the div tag
as shown below:
Observation:
1. The expression root[1][0] gives us an Element object that represents the
div tag.
2. The text content of the div tag is reported as merely newlines and spaces,
without considering the text content within its children.
If a tag contains multiple child tags that contain text, we can iterate through all the
nested text content as demonstrated below:
This is
Bold
This is
Italics
This is
Underlined
Observation:
1. The iternext() method allows us to iterate through nested text content.
2. The strip() method has been used to remove the additional whitespaces
(especially the newline characters) in the beginning and end of each of the
text content.
3. We see that this lists out even the text content present within the p tags (and
their child tags) that are present within the div tag.
18. Parsing XML 605
Observation:
1. Recall from the previous example that root[1][0] represents the div tag
and itertext() allows us to iterate through all it’s nested text nodes.
2. The join() method will combine all those pieces of text together (using the
null string separator, "").
Observation:
1. We are iterating over all p tags that are the direct children of the div tag.
2. We are combining all text within the p tags and printing them using the
technique covered in section 18.2.3.5.
606 18. Parsing XML
NOTE:
The findall() method also supports Xpath and namespaces, both of which are
outside the scope of this chapter.
The same sample code above is repeated here using find() instead of
findall():
Observation:
1. Since we are not expecting a list of Element objects, we are not using a loop.
We instead directly work with the Element object returned.
2. As can be seen from the output, this returns the first p tag’s contents instead
of any or all of the p tags within the div tag.
18.3 Questions
1. Compare the ElementTree module with BeautifulSoup for parsing XML
documents.
2. Explain with examples how ElementTree can be used to parse through XML
content:
1. From a file
2. From a string
3. Write a short note on extracting the contents of a Tag using ElementTree.
4. Write a short note on searching within the XML parse tree using
ElementTree.
18. Parsing XML 607
18.4 Exercises
1. Write a Python script to verify if the given XML content is actually HTML by
maintaining a small list of sample standard HTML tags. Any tag found in the
XML content that is not present in the list of standard tags should result in the
conclusion that the content is not valid HTML.
2. Write a Python script to convert an XML file into HTML using the following
rules:
1. The <head> section of the HTML content has to be created by the script.
2. Every <employee> tag in XML should become <div
class="employee"> in HTML.
3. Every <name> and <id> tag within <employee> in XML should become
<p class="name"> and <p class="id"> respectively in HTML.
4. Any other child tag found within <employee> in XML should become a
<p> tag in HTML.
3. Write a Python script that analyses an XML file and displays the following:
1. The total number of tags encountered
2. The most frequently occurring tag
3. The maximum level of nesting of tags
4. Write a Python script that reads an XML file and lists out the names of tags
that contain the string “Python” in it’s text content.
5. Write a Python script that reads an XML file and lists out all tags that contain
the class attribute with value “test”. Note that a single tag can contain
multiple values for the class attribute. It would be required to check if “test”
is present as a value for the class attribute.
608 18. Parsing XML
SUMMARY
19 PARSING JSON
Parse JSON strings and extract data in the form of Python objects
PARSING JSON
19.1 Introduction
The JavaScript Object Notation (JSON) has become a popular textual
representation of data. Like XML, JSON too is textual and structured, but differs from
XML by requiring significantly lesser metadata within the document! Originally
developed as a textual representation of JavaScript objects that can be transmitted
through streams, it is now an effective replacement for XML in many cases!
A JSON document basically comprises of collections and individual values. The
supported collections are objects and arrays that map on to Python’s dict and list
types respectively. The scalar values could be numbers or strings, mapping to
Python’s int/float and str types. In addition, JSON also supports Boolean literals
(true and false) and a special null literal, which map on to Python's bool and
None respectively.
JSON objects and Python objects can be inter-converted using the built-in json
module. The interface of this module is very similar to the pickle module covered in
section 14.6 & 14.7 where we had used the methods dump() and load(), but a
significant difference is that multiple JSON objects written one after another to a single
file will not make it a JSON file! A single JSON file is only supposed to have a single
JSON object within it (which in turn can have anything!). Multiple JSON objects written
sequentially into a file can have it’s applications, but is technically not a single JSON
file.
str String
In order to convert an existing Python object into it’s equivalent JSON representation,
we use the json.dumps() method (which stands for “dump string”) that has the
following syntax:
json.dumps(obj)
Observation:
1. We have imported the json module.
2. We see that the JSON output matches the Python representation pretty well.
Dictionaries become objects, lists and tuples become arrays, int and str
objects remain strings and numbers, True and False become true and
false respectively and None becomes null.
Here is a more complex example:
Observation:
1. We have a dictionary whose keys are “Ram” and “Sham”.
2. The values of this dictionary are again dictionaries with keys “age” and
“hobbies”.
3. The values corresponding to key “hobbies” are arrays of strings.
Observation:
1. This is the same example as before, but we have used an indentation level of
4 (indent=4).
2. This indentation results in 4 spaces per level of indentation.
3. We also see that the string now contains newline characters to support pretty-
printing.
19. Parsing JSON 613
json.dump(obj, file)
Here is a simple example that writes a list into a file in JSON format:
Observation:
1. We import the json module. We open the file “test.json” in (textual) write
mode and obtain a file handle that is stored in the variable f.
2. We dump a list contents to the file represented by the file handle f.
3. We close the file to be sure that the data is indeed flushed and written to the
file.
The file “test.json” will contain the following text:
[1, 3, 5, 7]
We will see how to read back the JSON content in section 19.3.2.
Array list
String str
614 19. Parsing JSON
null None
In order to convert a JSON string into it’s equivalent Python object representation, we
use the json.loads() method (which stands for “load string”) that has the following
syntax:
json.loads(jsonString)
Here are some examples to convert various JSON strings into Python objects:
Observation:
1. We have imported the json module.
2. We see that the Python output matches the JSON representation pretty well.
Objects become dictionaries, arrays become lists, integers and strings
become int and str objects, true and false become True and False
respectively and null becomes None (which of course, is not visible in the
output).
19. Parsing JSON 615
json.load(file)
Here is a simple example that reads the previously written list from the file
test.json (in section 19.2.3):
Observation:
1. We import the json module. We load the file “test.json” for reading and
store the file handle in the variable f.
2. We load an object from the file identified by the file handle f.
3. On printing the object, we see the same list we had stored in the JSON file.
4. Closing the file is not critical as we have opened it in read mode.
raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1
column 7 (char 6)
Observation:
1. We have imported the json module. We are attempting to convert a JSON
string into it’s Python equivalent.
2. The JSON string is malformed as the open square bracket does not have a
matching close square bracket.
3. We see the JSONDecodeError exception being raised, which identifies the
error and it’s location within the string.
4. In programs, we might be interested in processing this exception in the
except block suitably.
19.4 Questions
1. Explain the following functions of the json module in Python:
1. dump()
2. dumps()
2. Explain the following functions of the json module in Python:
1. load()
2. loads()
19.5 Exercises
1. Write a Python script to demonstrate the saving and loading of data in JSON
format using the json module.
2. Write a Python script to load XML data of a known format and write it to a
JSON file.
3. Write a Python script to load JSON data of a known format and write it to an
XML file.
19. Parsing JSON 617
SUMMARY
20 MISCELLANEOUS
Convert a data item from one built-in type to another and know
how the conversion takes place.
MISCELLANEOUS
20.1 Data Types Revisited
The diagram below shows the various built-in numeric data types available in
Python:
__builtin__.object
numbers.Number
numbers.Complex
complex numbers.Real
float
numbers.Rational
numbers.Integral fractions.Fraction
int bool
620 20. Miscellaneous
Objects of type complex cannot be directly converted to int – their real and imag
parts can be!
If x is an instance of a user-defined class, it’s __int__() method is invoked to
determine the integer equivalent as covered in section 12.8.3.3. If this method is not
implemented, a TypeError occurs!
622 20. Miscellaneous
>>> str(True)
'True'
>>> str(12)
'12'
>>> str(12.5)
'12.5'
>>> str(2+3j)
'(2+3j)'
>>> str(None)
'None'
>>> str([2,3,4])
'[2, 3, 4]'
>>> str((2,3,4))
'(2, 3, 4)'
>>> str({1:2,3:4})
'{1: 2, 3: 4}'
>>> str({1,2,3,4})
'{1, 2, 3, 4}'
6. If the class does not implement __bool__() above but implements the
magic function __len__(), that method’s return value is used to determine
the Boolean value (the function must return an integer, which if 0 is evaluated
as False and True otherwise).
7. If all the above checks fail, it is evaluated as True!
For the sake of providing common, consistent examples for the following sections, we
are going to use these 2 statements:
>>> x=0b1100
>>> y=0b0110
We use these values as they help demonstrate all possible cases in the truth table!
Observe the left-most bits of x and y – they are 1 and 0 respectively. The next bit in
each of them is 1 and 1 respectively. The next bit in each of them is 0 and 1
respectively. And finally, the last bit in each of them is 0 and 0 respectively. Thus, we
have all the 4 permutations of 0s and 1s we want.
Just before we proceed, let us print the decimal equivalent of these:
(continuation)
>>> x
12
>>> y
6
20. Miscellaneous 625
NOTE:
Conversion from binary to decimal (and vice versa) is beyond the scope of this
book.
>>> x & y
4
Let us convert the result 4 to binary and see the bit pattern:
>>> bin(4)
'0b100'
NOTE:
The binary value 0b100 is the same as 0b0100, which we shall use to understand
how the operation took place.
The table below summarises how the operation took place. Do note that the operation
took place column by column:
x 1 1 0 0
y 0 1 1 0
x & y 0 1 0 0
20.2.2 Bitwise OR
The bitwise OR operator (|) works very similar to the logical OR operator (or), except
for these 2 differences:
626 20. Miscellaneous
>>> x | y
14
Let us convert the result 14 to binary and see the bit pattern:
>>> bin(14)
'0b1110'
The table below summarises how the operation took place. Do note that the operation
took place column by column:
x 1 1 0 0
y 0 1 1 0
X | y 1 1 1 0
>>> x ^ y
10
Let us convert the result 10 to binary and see the bit pattern:
>>> bin(10)
'0b1010'
The table below summarises how the operation took place. Do note that the operation
took place column by column:
x 1 1 0 0
y 0 1 1 0
x ^ y 1 0 1 0
>>> ~x
-13
~x 0 0 1 1
NOTE:
To understand the binary pattern of the result and how the result is -13, a
knowledge of the 2’s complement system is required, which is outside the scope of
this book!
628 20. Miscellaneous
NOTE:
The variable x is not affected by this operation unless the expression is x <<= y!
Example:
>>> x=10
>>> bin(x)
'0b1010'
>>> x<<2
40
>>> bin(40)
'0b101000'
Observation:
1. The binary pattern of the integer value 10 is 1010.
2. When this pattern is shifted left 2 times, with 0 being inserted from the right
end, the pattern becomes 101000
For negative integers, each time all bits are shifted right, the rightmost bit is dropped
off and the leftmost bit is re-inserted as shown in the diagram.
NOTE:
The variable x is not affected by this operation unless the expression is x >>= y!
Example:
>>> x=10
>>> bin(x)
'0b1010'
>>> x>>2
2
>>> bin(2)
'0b10'
Observation:
1. The binary pattern of the integer value 10 is 1010.
2. When this pattern is shifted right 2 times, with 0 being inserted from the left,
the pattern becomes 10.
>>> b'Hello'
b'Hello'
>>> b"Hello"
b'Hello'
>>> b'''Hello'''
b'Hello'
>>> b"""Hello"""
b'Hello'
630 20. Miscellaneous
Objects of type bytes can also be created using the bytes() constructor. The
default constructor creates a bytes object representing a single null character (ASCII
0)as a byte:
>>> bytes()
b''
An integer argument can be provided to specify the length of the bytes sequence:
>>> bytes(5)
b'\x00\x00\x00\x00\x00'
More commonly, a string can be passed as the first argument to construct the
equivalent bytes sequence, with a second argument specifying the encoding, as
shown in the example below:
>>> bytes("Hello","utf-8")
b'Hello'
Considering that each byte can be specified using 2 hexadecimal digits, it is also
possible to build a bytes object using the fromhex class function, passing a
hexadecimal string sequence with 2 hex digits per byte and optional spaces between
bytes for readability:
Observation:
1. Being a class function, we use the class name (bytes) to invoke it.
2. We have specified hex values for 3 bytes – 10, a9 and 6d. These are
represented as \x10, \xa9 and m respectively. (6d is the hex ASCII code of
the character m!)
20.4 Questions
1. Explain the hierarchy of built-in basic numeric data types with the help of a
diagram.
2. Write a short note on conversion to the bool type.
3. Explain bitwise operations in Python with examples.
4. Explain the bytes class of Python.
20. Miscellaneous 631
20.5 Exercises
1. Write a menu-driven Python program that supports the following operations
on a given integer and displays the result after each operation in binary:
1. Setting a particular bit of an integer
2. Resetting a particular bit of an integer
3. Toggling a particular bit of an integer
4. Extracting a particular bit of an integer
632 20. Miscellaneous
SUMMARY
21 APPENDIX –
Understand how to convert Python 2.x code into Python 3.x code or
vice versa.
Python 2 Python 3
In Python 2, a trailing comma at the end of a print statement suppresses the newline
character at the end. This has been replaced by the end keyword argument of the
print() function in Python 3:
Python 2 Python 3
NOTE:
In Python 2, the interpreter will display the prompt in the next line even when the
print statement did not print a newline. This newline character will not be printed
when executed as a script, however!
To print an empty line, Python 3 uses print() whereas Python 2 uses print:
Python 2 Python 3
>>> >>>
Keyword arguments like end were introduced in Python 3. We similarly can now
provide a separator between items using the sep keyword argument (section 2.7.1.3)
as follows:
Python 2 Python 3
Python 2 Python 3
The range() function (section 3.3.5) returns an iterator instead of a list in Python 3:
Python 2 Python 3
Python 2 provided the xrange() for the same reason, which no longer exists in
Python 3.
21. Appendix – Python 2 Vs. Python 3 637
21.2.3 Comparisons
The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when
the operands don’t have a meaningful natural ordering.:
Python 2 Python 3
Furthermore, the cmp() function (and the underlying __cmp__() magic method) has
been removed in Python 3:
Python 2 Python 3
21.2.4 Integers
The long data type has been renamed to int in Python 3. Regardless of the “size” of
the integer, the data type in Python 3 is int, which can handle arbitrarily large
integers (section 2.3.1):
Python 2 Python 3
The sys.maxint constant was removed, since there is no longer a limit to the value
of integers.
The repr() of a long integer doesn’t include the trailing L anymore, so code that
unconditionally strips that character will chop off the last digit instead. Use str()
instead to obtain the string representation of an integer in Python 3.
In Python 3, the division operator (/) can return a float. To guarantee integer
division, the // operator can be used:
Python 2 Python 3
Octal literals in Python 3 no longer use the prefix “0” - “0o” needs to be used instead:
Python 2 Python 3
Python 2 Python 3
Python 2 Python 3
In Python 3, nonlocal is a reserved word. Using nonlocal x you can now assign
directly to a variable in an outer (but non-global) scope.
640 21. Appendix – Python 2 Vs. Python 3
Tuple parameter unpacking has been removed in Python 3. You can no longer write
def f(a, (b, c)): .... Use def f(a, b_c): b, c = b_c instead:
Python 2 Python 3
Python 2 Python 3
Python 2 Python 3
last): last):
File "<stdin>", line 1, in File "<stdin>", line 1, in
<module> <module>
ValueError: Invalid Value ValueError: Invalid Value
Also, Python 3 requires the usage of the as keyword in the except block:
Python 2 Python 3
ConfigParser configparser
copy_reg copyreg
Queue queue
SocketServer socketserver
markupbase _markupbase
repr reprlib
test.test_support test.support
642 21. Appendix – Python 2 Vs. Python 3
Some related modules have been grouped into packages and the submodule names
have been simplified. The resulting new packages are:
• dbm (anydbm, dbhash, dbm, dumbdbm, gdbm, whichdb).
• html (HTMLParser, htmlentitydefs).
• http (httplib, BaseHTTPServer, CGIHTTPServer,
SimpleHTTPServer, Cookie, cookielib).
• tkinter (all Tkinter related modules except turtle).
• urllib (urllib, urllib2, urlparse, robotparse).
• xmlrpc (xmlrpclib, DocXMLRPCServer, SimpleXMLRPCServer).
func_code __code__
func_defaults __defaults__
func_dict __dict__
func_doc __doc__
func_globals __globals__
func_name __name__
NOTE:
More information on the 2to3 tool can be found at
https://docs.python.org/2/library/2to3.html.
More information on the 3to2 tool can be found at
https://pypi.python.org/pypi/3to2.
Python 2 Python 3
To make the above code work with both Python 2 and Python 3, we could use the six
module as follows:
Another demonstration of the six module is with respect to a few constants that come
handy as replacements for certain objects (that got deprecated in Python 3):
six.class_types
six.integer_types
six.string_types
six.text_type
six.binary_type
Observation:
1. In Python 2, a Unicode string is definitely derived from basestring, but does
not qualify as being a str object.
2. A Unicode string in Python 2 qualifies as being six.string_types
nevertheless.
Observation:
1. In Python 3, all strings are Unicode strings and are thus instances of the str
class.
2. There is no base class called basestring in Python 3.
3. However, Unicode strings in Python 3 still qualify as being instances of
six.string_types.
NOTE:
More information on the six module can be found at
https://pypi.python.org/pypi/six.
21. Appendix – Python 2 Vs. Python 3 647
SUMMARY