Python Notes 3rd Mca
Python Notes 3rd Mca
Python Notes 3rd Mca
UNIT – I
1. Outline the different types of statements in python with proper syntax and
examples. 10M
Any Instruction that a python interpreter can execute (carry out) is called a Statement.
An Instruction is an order/command given to a computer processor by a computer program to perform some mathematical
or logical manipulations (calculations). And, Each and every line or a sentence in any programming language is called an
instruction. If you didn’t understand the above definition of statement clearly,
Then, Here is a simple way to define Statement, So, In simple words – A Statement is the smallest executable unit of code
that has an effect, like creating a variable or displaying a value. Each and every line of code that we write in any
programming language is called a statement. Because all the lines are executable by the interpreter or the compiler of that
programming language.
Example: 1
x=3
print(x)
Output: 3
The first line is an assignment statement that gives a value to x.
The second line is a print statement that displays the value of x.
When you type a statement then the interpreter executes it, which means that it does whatever the statement says.
Some other kinds of statements in Python are – if statement, else statement, while statement, for statement, import statement,
etc. which will be discussed on later article.
2. Briefly discuss about the looping techniques in Python with suitable examples.
10M
Python For loop is used for sequential traversal i.e. it is used for iterating over an iterable like string, tuple, list,
etc. It falls under the category of definite iteration. Definite iterations mean the number of repetitions is specified
explicitly in advance. In Python, there is no C style for loop, i.e., for (i=0; i<n; i++). There is “for in” loop which
is similar to for each loop in other languages. Let us learn how to use for in loop for sequential traversals.
Syntax:
for var in iterable:
# statements
# Python program to illustrate Iterating over a list
print("List Iteration")
l = ["geeks", "for", "geeks"]
for i in l:
print(i)
Loop control statements change execution from its normal sequence. When execution leaves a scope, all
automatic objects that were created in that scope are destroyed. Python supports the following control statements.
Continue :
Python range() is a built-in function that is used when a user needs to perform an action a specific number of
times. range() in Python(3.x) is just a renamed version of a function called xrange() in Python(2.x). The range()
function is used to generate a sequence of numbers. Depending on how many arguments user is passing to the
function, user can decide where that series of numbers will begin and end as well as how big the difference will
be between one number and the next.range() takes mainly three arguments.
• start: integer starting from which the sequence of integers is to be returned
• stop: integer before which the sequence of integers is to be returned.
The range of integers end at stop – 1.
• step: integer value which determines the increment between each integer in the sequence
# printing a number
for i in range(10):
print(i, end=" ")
print()
Output: 0,1,2,3,4,5,6,7,8,9
Output : 10,20,30,40
In most of the programming languages (C/C++, Java, etc), the use of else statements has been restricted with the
if conditional statements. But Python also allows us to use the else condition with for loops.
Output : 1,2,3
No break
Is used to execute a block of statements repeatedly until a given condition is satisfied. And when the condition
becomes false, the line immediately after the loop in the program is executed. While loop falls under the category
of indefinite iteration. Indefinite iteration means that the number of times the loop is executed isn’t specified
explicitly in advance.
count = 0
while (count < 3):
count = count + 1
print("Hello Geek")
Arithmetic Operators
Arithmetic operators are used to performing mathematical operations like addition, subtraction, multiplication,
and division.
Comparison
These operators compare the values on either sides of them and decide the relation among them. They are also
called Relational operators.
Assume variable a holds 10 and variable b holds 20, then :
== If the values of two operands are equal, then the condition becomes (a == b) is not true.
true.
!= If values of two operands are not equal, then condition becomes (a != b) is true.
true.
<> If values of two operands are not equal, then condition becomes (a <> b) is true. This
true. is similar to !=
operator.
> If the value of left operand is greater than the value of right operand, (a > b) is not true.
then condition becomes true.
< If the value of left operand is less than the value of right operand, (a < b) is true.
then condition becomes true.
>= If the value of left operand is greater than or equal to the value of (a >= b) is not true.
right operand, then condition becomes true.
<= If the value of left operand is less than or equal to the value of right (a <= b) is true.
operand, then condition becomes true.
Example
Assume variable a holds 10 and variable b holds 20, then :
a = 21
b = 10
c=0
if ( a == b ):
print "a is equal to b"
else:
print "a is not equal to b"
a = 5;
b = 20;
if ( a <= b ):
print "a is either less than or equal to b"
else:
print " a is neither less than nor equal to b"
if ( b >= a ):
print " b is either greater than or equal to b"
else:
Bitwise operators :
In Python, bitwise operators are used to performing bitwise calculations on integers. The integers are first
converted into binary and then operations are performed on bit by bit, hence the name bitwise operators. Then
the result is returned in decimal format.
a = 10
b=4
INPUT Statements
Sometimes a developer might want to take user input at some point in the program. To do this Python provides
an input() function.
Syntax:
input(“prompt')
Example 1: Python get user input with a message
# Taking input from the user
name = input("Enter your name: ")
# Output
print("Hello, " + name)
print(type(name))
Output Statement
Python provides the print() function to display output to the standard output devices.
Syntax:
print(value(s), sep= ‘ ‘, end = ‘\n’, file=file, flush=flush)
Parameters:
value(s) : Any value, and as many as you like. Will be converted to string before printed.
sep=’separator’ : (Optional) Specify how to separate the objects, if there is more than one. Default :’ ‘
flush : (Optional) A Boolean, specifying if the output is flushed (True) or buffered (False). Default: False
5. List and explain the basic types of Arguments in python with example.
6. Interpret the following in python functions with suitable code snippet:
i)Positional arguments ii) Keyword arguments iii) Variable length arguments
2. Keyword Arguments:
Functions can also be called using keyword arguments of the form kwarg=value. During a function call, values
passed through arguments need not be in the order of parameters in the function definition. This can be achieved
by keyword arguments. But all the keyword arguments should match the parameters in the function definition
Example:
def add(a,b,c):
return (a+b+c)
print(add(b=10,a=5,c=20)) #35
print(add(a=10,b=5,c=40)) #55
3. Positional Arguments
During a function call, values passed through arguments should be in the order of parameters in the function
definition. This is called positional arguments. Keyword arguments should follow positional arguments only.
Example:
def add(a,b,c):
return (a+b+c)
print(add(a=10,b=5,c=20))#35
print(add(a=10,b=5,c=40)) #55
Tuple
A tuple is similar to the list in many ways. Like lists, tuples also contain the collection of the items of different
data types. The items of the tuple are separated with a comma (,) and enclosed in parentheses (). A tuple is a read-
only data structure as we can't modify the size and value of the items of a tuple.
Example :
tup = ("hi", "Python", 2)
print (type(tup)) # Checking type of tup
print (tup) #Printing the tuple
print (tup[0:1]) # Tuple slicing
print (tup + tup) # Tuple concatenation using + operator
print (tup * 3) # Tuple repatation using * operator
t[2] = "hi" # Adding value to tup. It will throw an error.
Output:
<class 'tuple'>
('hi', 'Python', 2)
('Python', 2)
('hi',)
('hi', 'Python', 2, 'hi', 'Python', 2)
('hi', 'Python', 2, 'hi', 'Python', 2, 'hi', 'Python', 2)
Traceback (most recent call last):
File "main.py", line 14, in <module>
t[2] = "hi";
TypeError: 'tuple' object does not support item assignment
Dictionary
Dictionary is an unordered set of a key-value pair of items. It is like an associative array or a hash table where
each key stores a specific value. Key can hold any primitive data type, whereas value is an arbitrary Python
object. The items in the dictionary are separated with the comma (,) and enclosed in the curly braces {}.
Example :
d = {1:'Jimmy', 2:'Alex', 3:'john', 4:'mike'}
print (d) # Printing dictionary
print("1st name is "+d[1]) # Accesing value using keys
print("2nd name is "+ d[4])
print (d.keys()) #printing keys
print (d.values()) #printing values
Output
{1: 'Jimmy', 2: 'Alex', 3: 'john', 4: 'mike'}
1st name is Jimmy
2nd name is mike
dict_keys([1, 2, 3, 4])
dict_values(['Jimmy', 'Alex', 'john', 'mike'])
Boolean
Boolean type provides two built-in values, True and False. These values are used to determine the given statement
true or false. It denotes by the class bool. True can be represented by any non-zero value or 'T' whereas false can
be represented by the 0 or 'F'.
Example :
print(type(True))
print(type(False))
print(false)
Output:
<class 'bool'>
<class 'bool'>
NameError: name 'false' is not defined
Set
Python Set is the unordered collection of the data type. It is iterable, mutable(can modify after creation), and has
unique elements. In set, the order of the elements is undefined; it may return the changed sequence of the element.
The set is created by using a built-in function set(), or a sequence of elements is passed in the curly braces and
separated by the comma. It can contain various types of values.
Example :
set1 = set() # Creating Empty set
set2 = {'James', 2, 3,'Python'} # Creating set with values
print(set2) #Printing Set value
set2.add(10) #Adding element to the set
print(set2) # Printing set after adding
set2.remove(2) # Removing element from the set
print(set2) # Printing set after removing
9. Define Function and give general form for defining and calling Functions
in Python with examples.
A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide
better modularity for your application and a high degree of code reusing.
As you already know, Python gives you many built-in functions like print(), etc. but you can also create your own
functions. These functions are called user-defined functions.
Formal parameters are mentioned in the function definition.
Actual parameters(arguments) are passed during a function call.
Defining a Function
You can define functions to provide the required functionality. Here are simple rules to define a function in
Python.
• Function blocks begin with the keyword def followed by the function name and parentheses ( ( ) ).
• Any input parameters or arguments should be placed within these parentheses. You can also define
parameters inside these parentheses.
• The first statement of a function can be an optional statement - the documentation string of the function
or docstring.
• The code block within every function starts with a colon (:) and is indented.
• The statement return [expression] exits a function, optionally passing back an expression to the caller. A
return statement with no arguments is the same as return None.
Syntax
def functionname( parameters ):
stmts
return [expression]
By default, parameters have a positional behavior and you need to inform them in the same order that they were
defined.
Example
The following function takes a string as input parameter and prints it on standard screen.
def printme( str ):
print(str)
return
Calling a Function
Defining a function only gives it a name, specifies the parameters that are to be included in the function and
structures the blocks of code.
Once the basic structure of a function is finalized, you can execute it by calling it from another function or directly
from the Python prompt. Following is the example to call printme() function −
def printme( str ):
print str
return;
printme("I'm first call to user defined function!")
printme("Again second call to the same function")
Output :
I'm first call to user defined function!
Again second call to the same function
Output
10
In the above program, lambda x: x * 2 is the lambda function. Here x is the argument and x * 2 is the expression
that gets evaluated and returned.
This function has no name. It returns a function object which is assigned to the identifier double. We can now
call it as a normal function. The statement
double = lambda x: x * 2
is nearly the same as:
def double(x):
return x * 2
Use of Lambda Function in python
We use lambda functions when we require a nameless function for a short period of time.
In Python, we generally use it as an argument to a higher-order function (a function that takes in other functions
as arguments). Lambda functions are used along with built-in functions like filter(), map() etc.
Example use with filter()
The filter() function in Python takes in a function and a list as arguments.
The function is called with all the items in the list and a new list is returned which contains items for which the
function evaluates to True.
Here is an example use of filter() function to filter out only even numbers from a list.
# Program to filter out only the even items from a list
my_list = [1, 5, 4, 6, 8, 11, 3, 12]
new_list = list(filter(lambda x: (x%2 == 0) , my_list))
print(new_list)
Output
[4, 6, 8, 12]
Example use with map()
The map() function in Python takes in a function and a list.
The function is called with all the items in the list and a new list is returned which contains items returned by that
function for each item.Here is an example use of map() function to double all the items in a list.
# Program to double each item in a list using map()
my_list = [1, 5, 4, 6, 8, 11, 3, 12]
new_list = list(map(lambda x: x * 2 , my_list))
print(new_list)
Output
[2, 10, 8, 12, 16, 22, 6, 24]
Unit-02
1. Illustrate the basic operations of List in depth. 10M
The list is a most versatile datatype available in Python which can be written as a list of comma-separated values
(items) between square brackets. Important thing about a list is that items in a list need not be of the same type.
Creating a list is as simple as putting different comma-separated values between square brackets.
For example –
list1 = ['physics', 'chemistry', 1997, 2000];
list2 = [1, 2, 3, 4, 5 ]; list3 = ["a", "b", "c", "d"]
Accessing Values
in Lists To access values in lists, use the square brackets for slicing along with the index or indices to obtain value
available at that index.
For example –
list1 = ['physics', 'chemistry', 1997, 2000];
list2 = [1, 2, 3, 4, 5, 6, 7 ]; print "list1[0]: ",
list1[0] print "list2[1:5]: ",
list2[1:5]
Updating Lists
You can update single or multiple elements of lists by giving the slice on the left-hand side of the assignment
operator, and you can add to elements in a list with the append() method.
For example –
list = ['physics', 'chemistry', 1997, 2000];
print "Value available at index 2 : "
print list[2] list[2] = 2001;
print "New value available at index 2 : "
print list[2]
Delete List
Elements To remove a list element, you can use either the del statement if you know exactly which element(s)
you are deleting or the remove() method if you do not know.
For example –
list1 = ['physics', 'chemistry', 1997, 2000];
print list1
del list1[2];
print "After deleting value at index 2 : "
print list1
1. append() : The append() method is used to add elements at the end of the list. This method can only add a
single element at a time. To add multiple elements, the append() method can be used inside a loop.
Code:
myList.append(4)
myList.append(5)
myList.append(6)
for i in range(7, 9):
myList.append(i)
print(myList)
Output:
[1,2,3,’EduCBA’,’makes learning fun!’,4,5,6,7,8]
2. extend() : The extend() method is used to add more than one element at the end of the list. Although it can add
more than one element, unlike append(), it adds them at the end of the list like append().
Code:
myList.extend([4, 5, 6])
for i in range(7, 9):
myList.append(i)
print(myList)
Output:
[1,2,3,’EduCBA’,’makes learning fun!’,4,5,6,7,8]
3. insert() : The insert() method can add an element at a given position in the list. Thus, unlike append(), it can
add elements at any position, but like append(), it can add only one element at a time. This method takes two
arguments. The first argument specifies the position, and the second argument specifies the element to be inserted.
Code:
myList.insert(3, 4)
myList.insert(4, 5)
myList.insert(5, 6)
print(myList)
Output:
[1,2,3,4,5,6,’EduCBA’,’makes learning fun!’,]
4. remove() : The remove() method is used to remove an element from the list. In the case of multiple occurrences
of the same element, only the first occurrence is removed.
Code:
myList.remove('makes learning fun!')
myList.insert(4, 'makes')
myList.insert(5, 'learning')
myList.insert(6, 'so much fun!')
print(myList)
Output:
[1,2,3,’EduCBA’,’makes’, ‘learning’, ‘so much fun!’]
5. pop() : The method pop() can remove an element from any position in the list. The parameter supplied to this
method is the index of the element to be removed.
Code:
myList.pop(4)
myList.insert(4, 'makes')
myList.insert(5, 'learning')
myList.insert(6, 'so much fun!')
print(myList)
Output:
[1,2,3,’EduCBA’,’makes’, ‘learning’, ‘so much fun!’]
6. slice : The slice operation is used to print a section of the list. The slice operation returns a specific range of
elements. It does not modify the original list.
Code:
print(myList[:4]) # prints from beginning to end index
print(myList[2:]) # prints from start index to end of list
print(myList[2:4]) # prints from start index to end index
print(myList[:]) # prints from beginning to end of list
Output:
[1,2,3,’EduCBA’]
[3,’EduCBA’,’makes learning fun!’]
[3,’EduCBA’]
[1,2,3,’EduCBA’,’makes learning fun!’]
7. reverse() : The reverse() operation is used to reverse the elements of the list. This method modifies the original
list. To reverse a list without modifying the original one, we use the slice operation with negative indices.
Specifying negative indices iterates the list from the rear end to the front end of the list.
Code:
print(myList[::-1]) # does not modify the original list
myList.reverse() # modifies the original list
print(myList)
Output:
[‘makes learning fun!’,’EduCBA’,3,2,1]
8. len() : The len() method returns the length of the list, i.e. the number of elements in the list.
Code:
print(len(myList))
Output:
5
9. min() & max() : The min() method returns the minimum value in the list. The max() method returns the
maximum value in the list. Both the methods accept only homogeneous lists, i.e. list having elements of similar
type.
Code:
print(min(myList))
Output:
TypeError : unorderable types : str() < int ()
Code:
print(min([1, 2, 3]))
print(max([1, 2, 3]))
Output:
1
3
10. count() : The function count() returns the number of occurrences of a given element in the list.
Code:
print(myList.count(3))
Output:
1
11. concatenate The concatenate operation is used to merge two lists and return a single list. The + sign is used
to perform the concatenation. Note that the individual lists are not modified, and a new combined list is returned.
Code:
yourList = [4, 5, 'Python', 'is fun!']
print(myList+yourList)
Output:
[1,2,4,’EduCBA’,’makes learning fun!’,4,5,’python’,’is fun’]
12. multiply : Python also allows multiplying the list n times. The resultant list is the original list iterated n times.
Code:
print(myList*2)
Output:
[1,2,3,’EduCBA’,’makes learning fun!’,1,2,3,’EduCBA’,’makes learning fun!’]
13. index() : The index() method returns the position of the first occurrence of the given element. It takes two
optional parameters – the begin index and the end index. These parameters define the start and end position of
the search area on the list. When supplied, the element is searched only in the sub-list bound by the begin and
end indices. When not supplied, the element is searched in the whole list.
Code:
print(myList.index('EduCBA')) # searches in the whole list
print(myList.index('EduCBA', 0, 2)) # searches from 0th to 2nd position
Output:
3
ValueError : ‘EduCBA’ is not in list
14. sort() : The sort method sorts the list in ascending order. This operation can only be performed on
homogeneous lists, i.e. lists having elements of similar type.
Code:
yourList = [4, 2, 6, 5, 0, 1] yourList.sort()
print(yourList)
Output:
[0,1,2,4,5,6]
15. clear() : This function erases all the elements from the list and empties them.
Code:
myList.sort()
print(myList)
Output:
[]
2. Implement a python program to simulate stack. 10M
3. Show the slicing and indexing methods of strings with examples 10M
Indexing : Means referring to an element of an iterable by its position within the iterable. Each of a string’s
characters corresponds to an index number and each character can be accessed using their index number.
We can access characters in a String in Two ways :
1. Accessing Characters by Positive Index Number
2. Accessing Characters by Negative Index Number
Accessing Characters by Positive Index Number: In this type of Indexing, we pass a Positive index(which we
want to access) in square brackets. The index number start from index number 0 (which denotes the first character
of a string)
# declaring the string
str = "Geeks for Geeks !"
# accessing the character of str at 0th index
print(str[0])
# accessing the character of str at 6th index
print(str[6])
# accessing the character of str at 10th index
print(str[10])
Accessing Characters by Negative Index Number : In this type of Indexing, we pass the Negative index(which
we want to access) in square brackets. Here the index number starts from index number -1 (which denotes the
last character of a string).
# declaring the string
str = "Geeks for Geeks !"
# accessing the character of str at last index
print(str[-1])
# accessing the character of str at 5th index from the last
print(str[-5])
# accessing the character of str at 10th index from the last
print(str[-10])
Slicing
Slicing in Python is a feature that enables accessing parts of sequence. In slicing string, we create a substring,
which is essentially a string that exists within another string. We use slicing when we require a part of string and
not the complete string.
Syntax :
string[start : end : step]
start : We provide the starting index.
end : We provide the end index(this is not included in substring).
step : It is an optional argument that determines the increment between
Example:
# declaring the string
str ="Geeks for Geeks !"
# slicing using indexing sequence
print(str[: 3])
print(str[1 : 5 : 2])
print(str[-1 : -12 : -2])
Python string is a sequence of Unicode characters that is enclosed in the quotations marks. In this article, we will
discuss the in-built function i.e. the functions provided by the Python to operate on strings.
# Python3 program to show the
# working of upper() function
text = 'geeKs For geEkS'
# upper() function to convert
# string to upper case
print("\nConverted String:")
print(text.upper())
# lower() function to convert
# string to lower case
print("\nConverted String:")
print(text.lower())
# converts the first character to
# upper case and rest to lower case
print("\nConverted String:")
print(text.title())
# original string never changes
print("\nOriginal String")
print(text)
capitalize() Converts the first character of the string to a capital (uppercase) letter
Function Name Description
expandtabs() Specifies the amount of space to be substituted with the “\t” symbol in the string
isalnum() Checks whether all the characters in a given string is alphanumeric or not
isnumeric() Returns “True” if all characters in the string are numeric characters
isprintable() Returns “True” if all characters in the string are printable or the string is empty
isspace() Returns “True” if all characters in the string are whitespace characters
rindex() Returns the highest index of the substring inside the string
rsplit() Split the string from the right by the specified separator
strip() Returns the string with both leading and trailing characters
zfill() Returns a copy of the string with ‘0’ characters padded to the left side of the string
5. With a example explain the working of dictionary in python. 5M
Dictionary in Python is an unordered collection of data values, used to store data values like a map, which, unlike
other Data Types that hold only a single value as an element, Dictionary holds key:value pair.
Key-value is provided in the dictionary to make it more optimized.
Creating a Dictionary
In Python, a Dictionary can be created by placing a sequence of elements within curly {} braces, separated by
‘comma’. Dictionary holds pairs of values, one being the Key and the other corresponding pair element being
its Key:value. Values in a dictionary can be of any data type and can be duplicated, whereas keys can’t be
repeated and must be immutable.
# Creating a Dictionary
# with Integer Keys
Dict = {1: 'Geeks', 2: 'For', 3: 'Geeks'}
print("\nDictionary with the use of Integer Keys: ")
print(Dict)
# Creating a Dictionary
# with Mixed keys
Dict = {'Name': 'Geeks', 1: [1, 2, 3, 4]}
print("\nDictionary with the use of Mixed Keys: ")
print(Dict)
# Creating a Dictionary
Dict = {1: 'Geeks', 'name': 'For', 3: 'Geeks'}
# Initial Dictionary
Dict = { 5 : 'Welcome', 6 : 'To', 7 : 'Geeks',
'A' : {1 : 'Geeks', 2 : 'For', 3 : 'Geeks'},
'B' : {1 : 'Geeks', 2 : 'Life'}}
print("Initial Dictionary: ")
print(Dict)
6. With an example explain the file handling methods of python with examples 10M
except:
print ("An error occurred")
Example:
# Program to handle multiple errors with one
# except statement
# Python 3
def fun(a):
if a < 4:
# throws ZeroDivisionError for a = 3
b = a/(a-3)
# throws NameError if a >= 4
print("Value of b = ", b)
try:
fun(3)
fun(5)
# note that braces () are necessary here for
# multiple exceptions
except ZeroDivisionError:
print("ZeroDivisionError Occurred and Handled")
except NameError:
print("NameError Occurred and Handled")
Example:
# Python program to demonstrate finally
# No exception Exception raised in try block
try:
k = 5//0 # raises divide by zero exception.
print(k)
# handles zerodivision exception
except ZeroDivisionError:
print("Can't divide by zero")
finally:
# this block is always executed
# regardless of exception generation.
print('This is always executed')
Raising Exception
The raise statement allows the programmer to force a specific exception to occur. The sole argument in raise
indicates the exception to be raised. This must be either an exception instance or an exception class (a class that
derives from Exception).
# Program to depict Raising Exception
try:
raise NameError("Hi there") # Raise Error
except NameError:
print ("An exception")
raise # To determine whether the exception was raised or not
Inheritance is the capability of one class to derive or inherit the properties from another class. The benefits of
inheritance are:
1. It represents real-world relationships well.
2. It provides reusability of a code. We don’t have to write the same code again and again. Also, it allows
us to add more features to a class without modifying it.
3. It is transitive in nature, which means that if class B inherits from another class A, then all the subclasses
of B would automatically inherit from class A.
default constructor: The default constructor is a simple constructor which doesn’t accept any arguments. Its
definition has only one argument which is a reference to the instance being constructed.
parameterized constructor: constructor with parameters is known as parameterized constructor. The
parameterized constructor takes its first argument as a reference to the instance being constructed known as self
and the rest of the arguments are provided by the programmer.
class Addition:
first = 0
second = 0
answer = 0
# parameterized constructor
def __init__(self, f, s):
self.first = f
self.second = s
def display(self):
print("First number = " + str(self.first))
print("Second number = " + str(self.second))
print("Addition of two numbers = " + str(self.answer))
def calculate(self):
self.answer = self.first + self.second
11.Illustrate the creation of class and object with proper syntax and example.
A class is a user-defined blueprint or prototype from which objects are created. Classes provide a means of
bundling data and functionality together. Creating a new class creates a new type of object, allowing new
instances of that type to be made. Each class instance can have attributes attached to it for maintaining its state.
Class instances can also have methods (defined by their class) for modifying their state.
To understand the need for creating a class let’s consider an example, let’s say you wanted to track the number
of dogs that may have different attributes like breed, age. If a list is used, the first element could be the dog’s
breed while the second element could represent its age.
Let’s suppose there are 100 different dogs, then how would you know which element is supposed to be which?
What if you wanted to add other properties to these dogs? This lacks organization and it’s the exact need for
classes.
Class creates a user-defined data structure, which holds its own data members and member functions, which can
be accessed and used by creating an instance of that class.
A class is like a blueprint for an object.
class ClassName:
# Statement-1
.
.
.
# Statement-N
Example:
# Python3 program to
# demonstrate defining
# a class
class Dog:
pass
Class Object
An Object is an instance of a Class. A class is like a blueprint while an instance is a copy of the class with actual
values. It’s not an idea anymore, it’s an actual dog, like a dog of breed pug who’s seven years old. You can have
many dogs to create many different instances, but without the class as a guide, you would be lost, not knowing
what information is required.
An object consists of :
• State: It is represented by the attributes of an object. It also reflects the properties of an object.
• Behavior: It is represented by the methods of an object. It also reflects the response of an object to other
objects.
• Identity: It gives a unique name to an object and enables one object to interact with other objects.
Example:
# Python3 program to
# demonstrate instantiating
# a class
class Dog:
# A simple class
# attribute
attr1 = "mammal"
attr2 = "dog"
# A sample method
def fun(self):
print("I'm a", self.attr1)
print("I'm a", self.attr2)
# Driver code
# Object instantiation
Rodger = Dog()
A CSV(Comma Separated Values) is a plain-text file format used to store tabular data such as a spreadsheet or
a database. It essentially stores a tabular data which comprises of numbers and text into plain text. Most of the
online services give users the liberty to export data from the website into CSV file format. CSV Files generally
open into Excel and nearly all the databases have different specific tools to allow the import of the same.
Every line of the file is called a record. And each record consists of fields that are separated by commas which
are also known as “delimiter” which is the default delimiter, others include pipe(|), semicolon(;). Given below is
a structure of a Normal CSV File separated by a comma, I am making use of a titanic CSV file.
CSV is a plain-text file which makes it easier for data interchange and also easier to import onto spreadsheet or
database storage. For example: You might want to export the data of a certain statistical analysis to CSV file and
then import it to the spreadsheet for further analysis. Overall it makes users working experience very easy
programmatically. Any language supporting a text file or string manipulation like Python can
work with CSV files directly.
import pandas as pd
# Creating Dataframe
details = pd.DataFrame({'ID': [101, 102, 103, 104, 105,106, 107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot', 'Pooja', 'Rahul', 'Nikita','Saurabh', 'Ayush',
'Dolly', "Mohit"],
'BRANCH': ['MCA', 'EEE', 'ISE', 'MECH', 'CSE', 'ECE', 'MBA', 'AERO', 'MCA',
'CIVIL']})
# Creating Dataframe
fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL','9000', '15000', 'NIL','4500', '1800', '250', 'NIL']})
# Merging Dataframe
print(pd.merge(details, fees_status, on='ID'))
# Concatination
print (pd.concat([details, fees_status]))
#diaplay non duplicate value
df = pd.DataFrame(fees_status)
# Here df.duplicated() list duplicate Entries in ROllno.
# So that ~(NOT) is placed in order to get non duplicate values.
non_duplicate = df[~df.duplicated('PENDING')]
# printing non-duplicate values
print(non_duplicate)
#filtering
df_filtered = df.query('ID>105')
print(df_filtered)
02. Elaborate the steps involved in accessing SQL database in python(10m)
Ans: Python is a high-level, general-purpose, and very popular programming language. Basically, it was designed
with an emphasis on code readability, and programmers can express their concepts in fewer lines of code. We
can also use Python with SQL. In this article, we will learn how to connect SQL with Python using the ‘MySQL
Connector Python‘ module. The diagram given below illustrates how a connection request is sent to MySQL
connector Python, how it gets accepted from the database and how the cursor is executed with result data.
To create a connection between the MySQL database and Python, the connect() method of mysql.connector
module is used. We pass the database details like HostName, username, and the password in the method call, and
then the method returns the connection object.
Step 1: Download and Install the free MySQL database from here.
Step 2: After installing the MySQL database, open your Command prompt.
Step 3: Navigate your Command prompt to the location of PIP. Click here to see, How to install PIP?
Step 4: Now run the commands given below to download and install “MySQL Connector”. Here, mysql.connector
statement will help you to communicate with the MySQL database.
Step 5: To check if the installation was successful, or if you already installed “MySQL Connector”, go to your
IDE and run the given below code :
import mysql.connector
If the above code gets executed with no errors, “MySQL Connector” is ready to be used.
Step 6: Now to connect SQL with Python, run the code given below in your IDE.
• mysql.connector allows Python programs to access MySQL databases.
• connect() method of the MySQL Connector class with the arguments will connect to MySQL and would
return a MySQLConnection object if the connection is established successfully.
• user = “yourusername”, here “yourusername” should be the same username as you set during MySQL
installation.
• password = “your_password”, here “your_password” should be the same password as you set during
MySQL installation.
• cursor() is used to execute the SQL statements in Python.
• execute() method is used to compile a SQL statement.
Pandas: Pandas is an open-source library that’s built on top of NumPy library. it is a Python package that provides
various data structures and operations for manipulating numerical data and statistics. It’s mainly popular for
importing and analyzing data much easier. Pandas is fast and it’s high-performance & productive for users.
Data Normalization: Data Normalization could also be a typical practice in machine learning which consists of
transforming numeric columns to a standard scale. In machine learning, some feature values differ from others
multiple times. The features with higher values will dominate the learning process.
Steps Needed
Here, we will apply some techniques to normalize the data and discuss these with the help of examples. For this,
let’s understand the steps needed for data normalization with Pandas.
Examples
Here, we create data by some random values and apply some normalization techniques to it.
import pandas as pd # importing packages
df = pd.DataFrame([
Output :
The maximum absolute scaling rescales each feature between -1 and 1 by dividing every observation by its
maximum absolute value. We can apply the maximum absolute scaling in Pandas using the .max() and .abs()
methods, as shown below.
The min-max approach (often called normalization) rescales the feature to a hard and fast range of [0,1] by
subtracting the minimum value of the feature then dividing by the range. We can apply the min-max scaling in
Pandas using the .min() and .max() methods.
df_min_max_scaled[column]=(df_min_max_scaled[column] - df_min_max_scaled[column].min()) /
(df_min_max_scaled[column].max() - df_min_max_scaled[column].min())
Output :
The z-score method (often called standardization) transforms the info into distribution with a mean of 0 and a
typical deviation of 1. Each standardized value is computed by subtracting the mean of the corresponding feature
then dividing by the quality deviation.
Output :
Data normalization consists of remodeling numeric columns to a standard scale. In Python, we will implement
data normalization in a very simple way. The Pandas library contains multiple built-in methods for calculating
the foremost common descriptive statistical functions which make data normalization techniques very easy to
implement.
Ans : strip() is an inbuilt function in Python programming language that returns a copy of the string with both
leading and trailing characters removed (based on the string argument passed).
The strip() method removes any leading (spaces at the beginning) and trailing (spaces at the end) characters
(space is the default leading character to remove).
Syntax
string.strip(characters)
Parameter Description
Every time we make a new model, we will require to import Numpy and Pandas. Numpy is a Library which
contains Mathematical functions and is used for scientific computing while Pandas is used to import and manage
the data sets.
Import pandas as pd
import numpy as np
Here we are importing the pandas and Numpy library and assigning a shortcut “pd” and “np” respectively.
dataset = pd.read_csv('Data.csv')
After carefully inspecting our dataset, we are going to create a matrix of features in our dataset (X) and create a
dependent vector (Y) with their respective observations. To read the columns, we will use iloc of pandas (used to
fix the indexes for selection) which takes two parameters — [row selection, column selection].
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values
Our object name is imputer. The Imputer class can take parameters like :
1. missing_values : It is the placeholder for the missing values. All occurrences of missing_values will
be imputed. We can give it an integer or “NaN” for it to find missing values.
2. strategy : It is the imputation strategy — If “mean”, then replace missing values using the mean along
the axis (Column). Other strategies include “median” and “most_frequent”.
3. axis : It can be assigned 0 or 1, 0 to impute along columns and 1 to impute along rows.
Now replacing the missing values with the mean of the column by using transform method.
After Encoding it is necessary to distinguish between between the variables in the same column, for this we will
use OneHotEncoder class from sklearn.preprocessing library.
One-Hot Encoding
One hot encoding transforms categorical features to a format that works better with classification and regression
algorithms.
Step 5: Splitting the Data set into Training set and Test Set
Now we divide our data into two sets, one for training our model called the training set and the other for testing
the performance of our model called the test set. The split is generally 80/20. To do this we import the
“train_test_split” method of “sklearn.model_selection” library.
Now to build our training and test sets, we will create 4 sets —
1. X_train (training part of the matrix of features),
2. X_test (test part of the matrix of features),
3. Y_train (training part of the dependent variables associated with the X train sets, and therefore also
the same indices) ,
4. Y_test (test part of the dependent variables associated with the X test sets, and therefore also the same
indices).
We will assign to them the test_train_split, which takes the parameters — arrays (X and Y), test_size (Specifies
the ratio in which to split the data set).
Most of the machine learning algorithms use the Euclidean distance between two data points in their
computations . Because of this, high magnitudes features will weigh more in the distance calculations than
features with low magnitudes. To avoid this Feature standardization or Z-score normalization is used. This is
done by using “StandardScaler” class of “sklearn.preprocessing”.
Further we will transform our X_test set while we will need to fit as well as transform our X_train set.
The transform function will transform all the data to a same standardized scale.
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
When you have created labels, you can access an item by referring to the label. Ex Return the value of "y":
print(myvar["y"])
Key/Value Objects as Series : You can also use a key/value object, like a dictionary, when creating a Series. Ex
Create a simple Pandas Series from a dictionary:
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
To select only some of the items in the dictionary, use the index argument and specify only the items you want
to include in the Series. Ex Create a Series using only data from "day1" and "day2":
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories, index = ["day1", "day2"])
print(myvar)
result:
calories duration
0 420 50
1 380 40
2 390 45
Locate Row : As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s):
#refer to the row index:
print(df.loc[0])
Result
calories 420
duration 50
Name: 0, dtype: int64
Result :
calories duration
0 420 50
1 380 40
Named Indexes: With the index argument, you can name your own indexes.
Result :
calories duration
day1 420 50
day2 380 40
day3 390 45
Locate Named Indexes : Use the named index in the loc attribute to return the specified row(s).
Ex Return "day2":
#refer to the named index:
print(df.loc["day2"])
Result :
calories 380
duration 40
Name: 0, dtype: int64
Load Files Into a DataFrame : If your data sets are stored in a file, Pandas can load them into a DataFrame.
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Result :
calories duration
0 420 50
1 380 40
2 390 45
08. List and explain the steps involved in Data wrangling process in python(10m)
Data Wrangling in Python : Data Wrangling is the process of gathering, collecting, and transforming Raw data
into another format for better understanding, decision-making, accessing, and analysis in less time. Data
Wrangling is also known as Data Munging.
Data Wrangling is a very important step. The below example will explain its importance as :
Books selling Website want to show top-selling books of different domains, according to user preference. For
example, a new user search for motivational books, then they want to show those motivational books which sell
the most or having a high rating, etc.
But on their website, there are plenty of raw data from different users. Here the concept of Data Munging or Data
Wrangling is used.
As we know Data is not Wrangled by System. This process is done by Data Scientists. So, the data Scientist will
wrangle data in such a way that they will sort that motivational books that are sold more or have high ratings or
user buy this book with these package of Books, etc.
On the basis of that, the new user will make choice. This will explain the importance of Data wrangling.
Data Wrangling is a crucial topic for Data Science and Data Analysis. Pandas Framework of Python is used for
Data Wrangling. Pandas is an open-source library specifically developed for Data Analysis and Data Science.
1. Data exploration: In this process, the data is studied, analyzed and understood by visualizing representations
of data.
2. Dealing with missing values: Most of the datasets having a vast amount of data contain missing values
of NaN, they are needed to be taken care of by replacing them with mean, mode, the most frequent value of
the column or simply by dropping the row having a NaN value.
3. Reshaping data: In this process, data is manipulated according to the requirements, where new data can be
added or pre-existing data can be modified.
4. Filtering data: Some times datasets are comprised of unwanted rows or columns which are required to be
removed or filtered
5. Other: After dealing with the raw dataset with the above functionalities we get an efficient dataset as per our
requirements and then it can be used for a required purpose like data analyzing, machine learning, data
visualization, model training etc.
Output
09. Illustrate the reshaping or pivoting operations of python(06m)
A pivot table is a similar operation that is commonly seen in spreadsheets and other programs that operate on
tabular data. The pivot table takes simple column-wise data as input, and groups the entries into a two-
dimensional table that provides a multidimensional summarization of the data.
Python has operations for rearranging tabular data, known as reshaping or pivoting operations.
What is reshaping in Python?
The numpy.reshape() function allows us to reshape an array in Python. Reshaping basically means, changing the
shape of an array. And the shape of an array is determined by the number of elements in each dimension.
Reshaping allows us to add or remove dimensions in an array. We can also change the number of elements in
each dimension.
Syntax and parameters
Here is the syntax of the function:
numpy.reshape(array, shape, order = 'C')
Return value: An array which is reshaped without any change to its data.
How to use the numpy.reshape() method in Python?
Let’s take the following one-dimensional NumPy array as an example.
Input:
import numpy as np
x = np.arange(20)
print(x) #prints out the array with its elements
print()
print(x.shape) #prints out the shape of the array
print()
print(x.ndim) #prints out the dimension value of the array
Output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
(20,)
1
In this case, the numbers [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] are the elements of the array x. The
output (20,)is the shape of the array. Finally the value 1 printed out signifies the dimension of the array.
Now to try out the numpy reshape function, we need to specify the original array as the first argument and the
shape of the second argument as a list or tuple. However, do keep in mind that if the shape does not match the
number of elements in the original array, a ValueError will occur.
In the example below, we are converting the defined 1-D array with 20 elements into a 2-D array. The outermost
dimension will have 2 arrays that contain 2 arrays, each with 5 elements.
Python Min()
The min() function, to return the lowest value.
Python next()
The next() function returns the next item in an iterator.
Unit 4
01: Elaborate the process of web scrapping in python. (10m)
To extract data using web scraping with python, you need to follow these basic steps:
1. Find the URL that you want to scrape.
2. Inspecting the Page.
3. Find the data you want to extract.
4. Write the code.
5. Run the code and extract the data.
6. Store the data in the required format.
Web Scraping is a technique to extract a large amount of data from several websites. The term "scraping" refers
to obtaining the information from another source (webpages) and saving it into a local file. For example: Suppose
you are working on a project called "Phone comparing website," where you require the price of mobile phones,
ratings, and model names to make comparisons between the different mobile phones. If you collect these details
by checking various sites, it will take much time. In that case, web scrapping plays an important role where by
writing a few lines of code you can get the desired results.
These are the following steps to perform web scraping. Let's understand the working of web scraping.
First, you should understand the requirement of data according to your project. A webpage or website contains a
large amount of information. That's why scrap only relevant information. In simple words, the developer should
be familiar with the data requirement.
The data is extracted in raw HTML format, which must be carefully parsed and reduce the noise from the raw
data. In some cases, data can be simple as name and address or as complex as high dimensional weather and stock
market data.
Write a code to extract the information, provide relevant information, and run the code.
As we have discussed above, web scrapping is used to extract the data from websites. But we should know how
to use that raw data. That raw data can be used in various fields. Let's have a look at the usage of web scrapping:
It is widely used to collect data from several online shopping sites and compare the prices of products and make
profitable pricing decisions. Price monitoring using web scrapped data gives the ability to the companies to know
the market condition and facilitate dynamic pricing. It ensures the companies they always outrank others.
o Market Research
Web Scrapping is perfectly appropriate for market trend analysis. It is gaining insights into a particular
market. The large organization requires a great deal of data, and web scrapping provides the data with a
guaranteed level of reliability and accuracy.
o Email Gathering
Many companies use personals e-mail data for email marketing. They can target the specific audience for their
marketing.
A single news cycle can create an outstanding effect or a genuine threat to your business. If your company
depends on the news analysis of an organization, it frequently appears in the news. So web scraping provides the
ultimate solution to monitoring and parsing the most critical stories. News articles and social media platform can
directly influence the stock market.
Web Scrapping plays an essential role in extracting data from social media websites such as Twitter,
Facebook, and Instagram, to find the trending topics.
The large set of data such as general information, statistics, and temperature is scrapped from websites, which
is analyzed and used to carry out surveys or research and development.
02: Elaborate the creation and working of ndarrays in python with suitable examples.(10m)
Ans : Create a NumPy ndarray Object
NumPy is used to work with arrays. The array object in NumPy is called ndarray. We can create
NumPy ndarray object by using the array() function.
Ex :
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
type(): This built-in Python function tells us the type of the object passed to it. Like in above code it shows that
arr is numpy.ndarray type.
To create an ndarray, we can pass a list, tuple or any array-like object into the array() method, and it will be
converted into an ndarray:
import numpy as np
arr = np.array((1, 2, 3, 4, 5))
print(arr)
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).
nested array: are arrays that have arrays as their elements.
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array
Ex :Create a 0-D array with value 42
import numpy as np
arr = np.array(42)
print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
These are the most common and basic arrays.
Ex : Create a 1-D array containing the values 1,2,3,4,5:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.
NumPy has a whole sub module dedicated towards matrix operations called numpy.mat
Ex : Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.
Ex : Create a 3-D array with two 2-D arrays, both containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
In this array the innermost dimension (5th dim) has 4 elements, the 4th dim has 1 element that is the vector, the
3rd dim has 1 element that is the matrix with the vector, the 2nd dim has 1 element that is 3D array and 1st dim
has 1 element that is a 4D array.
Note: The result includes the start index, but excludes the end index.
Ex : Slice elements from index 4 to the end of the array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
Negative Slicing
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])
STEP
Use the step value to determine the step of the slicing:
Ex Return every other element from index 1 to index 5:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
ex :Return every other element from the entire array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[::2])
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 2])
ex : From both elements, slice index 1 to index 4 (not included), this will return a 2-D array:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])
To create you own ufunc, you have to define a function, like you do with normal functions in Python, then you
add it to your NumPy ufunc library with the frompyfunc() method.
import numpy as np
def myadd(x, y):
return x+y
myadd = np.frompyfunc(myadd, 2, 1)
print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric
Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Dataframe.aggregate() function is used to apply some aggregation across one or more column. Aggregate using
callable, string, dict, or list of string/callables. Most frequently used aggregations are:
sum: Return the sum of the values for the requested axis
min: Return the minimum of the values for the requested axis
max: Return the maximum of the values for the requested axis
Example #1: Aggregate ‘sum’ and ‘min’ function across all the columns in data frame.
import pandas as pd # importing pandas package
df = pd.read_csv("nba.csv") # making data frame from csv file
df[:10] # printing the first 10 rows of the dataframe
Aggregation works with only numeric type columns.
# Applying aggregation across all the columns
# sum and min will be found for each
# numeric type column in df dataframe
df.aggregate(['sum', 'min'])
Output:
For each column which are having numeric values, minimum and sum of all values has been found. For dataframe
df , we have four such columns Number, Age, Weight, Salary.
Example #2:
In Pandas, we can also apply different aggregation functions across different columns. For that, we need to pass
a dictionary with key containing the column names and values containing the list of aggregation functions for
any specific column.
import pandas as pd # importing pandas package
df = pd.read_csv("nba.csv") # making data frame from csv file
df.aggregate({"Number":['sum', 'min'],
"Age":['max', 'min'],
"Weight":['min', 'sum'],
"Salary":['sum']}) # We are going to find aggregation for these columns
Output:
Separate aggregation has been applied to each column, if any specific aggregation is not applied on a column
then it has NaN value corresponding to it.
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr.dtype)
The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly
like float for float and int for integer.
Ex : Change data type from float to integer by using 'i' as parameter value:
import numpy as np
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype('i')
print(newarr)
print(newarr.dtype)
ex : Change data type from float to integer by using int as parameter value:
import numpy as np
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype(int)
print(newarr)
print(newarr.dtype)
For example,
tup = (22, 3, 45, 4, 2.4, 2, 56, 890, 1)
>>> min(tup)
1
max(): gives the largest element in the tuple as an output. Hence, the name is max().
For example,
>>> tup = (22, 3, 45, 4, 2.4, 2, 56, 890, 1)
>>> max(tup)
890
max(): gives the sum of the elements present in the tuple as an output.
For example,
>>> tup = (22, 3, 45, 4, 2, 56, 890, 1)
>>> sum(tup)
1023
NumPy Array: Numpy array is a powerful N-dimensional array object which is in the form of rows and columns.
We can initialize NumPy arrays from nested Python lists and access it elements.
import numpy as np
arr1 = np.(0,1,2,3) # Initializing the array
print('First array:')
print(arr1)
print('\nSecond array:')
arr2 = np.array([12, 12])
print(arr2)
Output:
First array:
[[ 0 1]
[ 2 3]]
Second array:
[12 12]
Adding the two arrays:
[[ 12 13]
[ 14 15]]
Subtracting the two arrays:
[[-12 -11]
[-10 -9]]
Multiplying the two arrays:
[[ 0 12]
[ 24 36]]
Dividing the two arrays:
[[ 0. 0.08333333]
[ 0.16666667 0.25 ]]
numpy.reciprocol()
This function returns the reciprocal of argument, element-wise. For elements with absolute values larger than 1,
the result is always 0 and for integer 0, overflow warning is issued.
Example:
Output
numpy.power()
This function treats elements in the first input array as the base and returns it raised to the power of the
corresponding element in the second input array.
Output:
First array is:
[ 5 10 15]
Applying power function:
[ 25 100 225]
Second array is:
[1 2 3]
Applying power function again:
[ 5 100 3375]
numpy.mod()
This function returns the remainder of division of the corresponding elements in the input array. The
function numpy.remainder() also produces the same result.
# Python code to perform mod function on NumPy array
import numpy as np
arr = np.array([5, 15, 20])
arr1 = np.array([2, 5, 9])
print('First array:')
print(arr)
print('\nSecond array:')
print(arr1)
print('\nApplying mod() function:')
print(np.mod(arr, arr1))
print('\nApplying remainder() function:')
print(np.remainder(arr, arr1))
Output:
First array:
[ 5 15 20]
Second array:
[2 5 9]
Applying mod() function:
[1 0 2]
Applying remainder() function:
[1 0 2]
For example, if we consider the matrix multiplication operation, if the shape of the two matrices is the same then
this operation will be easily performed. However, we may also need to operate if the shape is not similar.
Consider the following example to multiply two arrays.
Example
1. import numpy as np
2. a = np.array([1,2,3,4,5,6,7])
3. b = np.array([2,4,6,8,10,12,14])
4. c = a*b;
5. print(c)
Output:
[ 2 8 18 32 50 72 98]
However, in the above example, if we consider arrays of different shapes, we will get the errors as shown below.
Example
1. import numpy as np
2. a = np.array([1,2,3,4,5,6,7])
3. b = np.array([2,4,6,8,10,12,14,19])
4. c = a*b;
5. print(c)
Output:
ValueError: operands could not be broadcast together with shapes (7,) (8,)
In the above example, we can see that the shapes of the two arrays are not similar and therefore they cannot be
multiplied together. NumPy can perform such operation by using the concept of broadcasting.
In broadcasting, the smaller array is broadcast to the larger array to make their shapes compatible with each other.
Broadcasting Rules
Broadcasting is possible if the following cases are satisfied.
1. The smaller dimension array can be appended with '1' in its shape.
2. Size of each output dimension is the maximum of the input sizes in the dimension.
3. An input can be used in the calculation if its size in a particular dimension matches the output size or its
value is exactly 1.
4. If the input size is 1, then the first data entry is used for the calculation along the dimension.
Broadcasting can be applied to the arrays if the following rules are satisfied.
1. All the input arrays have the same shape.
2. Arrays have the same number of dimensions, and the length of each dimension is either a common length
or 1.
3. Array with the fewer dimension can be appended with '1' in its shape.
4.
Let's see an example of broadcasting.
Example
1. import numpy as np
2. a = np.array([[1,2,3,4],[2,4,5,6],[10,20,39,3]])
3. b = np.array([2,4,6,8])
4. print("\nprinting array a..")
5. print(a)
6. print("\nprinting array b..")
7. print(b)
8. print("\nAdding arrays a and b ..")
9. c = a + b;
10. print(c)
Output:
printing array a..
[[ 1 2 3 4]
[ 2 4 5 6]
[10 20 39 3]]
[2 4 6 8]
[[ 3 6 9 12]
[ 4 8 11 14]
[12 24 45 11]]
numpy.array() in Python
The homogeneous multidimensional array is the main object of NumPy. It is basically a table of elements which
are all of the same type and indexed by a tuple of positive integers. The dimensions are called axis in NumPy.
The NumPy's array class is known as ndarray or alias array.
The numpy.array is not the same as the standard Python library class array.array. The array.array handles only
one-dimensional arrays and provides less functionality.
Syntax
1. numpy.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
Parameters
There are the following parameters in numpy.array() function.
1) object: array_likeJava Try Catch Any object, which exposes an array interface whose __array__ method
returns any nested sequence or an array.
3) copy: bool(optional)
If we set copy equals to true, the object is copied else the copy will be made when an object is a nested sequence,
or a copy is needed to satisfy any of the other requirements such as dtype, order, etc.
'A' Unchanged When the input is F and not C then F order otherwise C order
When copy=False or the copy is made for the other reason, the result will be the same as copy= True with some
exceptions for A. The default order is 'K'.
5) subok : bool(optional)
When subok=True, then sub-classes will pass-through; otherwise, the returned array will force to be a base-class
array (default).
6) ndmin : int(optional)
This parameter specifies the minimum number of dimensions which the resulting array should have. Users can
be prepended to the shape as needed to meet this requirement
Unit 5
01 : Demonstrate the concept of data visualization in python(10m)
Ans : Python provides various libraries that come with different features for visualizing data. All these libraries
come with different features and can support various types of graphs. In this tutorial, we will be discussing four
such libraries.
• Matplotlib
• Seaborn
• Bokeh
• Plotly
• Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
• Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript for
Platform compatibility.
Plotting x and y points
If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to the plot function.
To plot only the markers, you can use shortcut string notation parameter 'o', which means 'rings'.
Ex :Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints, 'o')
plt.show()
Multiple Points
You can plot as many points as you like, just make sure you have the same number of points in both axis.
Ex :Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()
If we do not specify the points in the x-axis, they will get the default values 0, 1, 2, 3, (etc. depending on the
length of the y-points. So, if we take the same example as above, and leave out the x-points, the diagram will
look like this:
02 : List and explain the different Python Libraries used for data visualization(05m)
Ans : Data Visualization is an extremely important part of Data Analysis. After all, there is no better way to
understand the hidden patterns and layers in the data than seeing them in a visual format! Don’t trust me? Well,
assume that you analyzed your company data and found out that a particular product was consistently losing
money for the company. Your boss may not pay that much attention to a written report but if you present a line
chart with the profits as a red line that is consistently going down, then your boss may pay much more attention!
This shows the power of Data Visualization!
Humans are visual creatures and hence, data visualization charts like bar charts, scatterplots, line charts,
geographical maps, etc. are extremely important. They tell you information just by looking at them whereas
normally you would have to read spreadsheets or text reports to understand the data. And Python is one of the
most popular programming languages for data analytics as well as data visualization. There are several libraries
available in recent years that create beautiful and complex data visualizations. These libraries are so popular
because they allow analysts and statisticians to create visual data models easily according to their specifications
by conveniently providing an interface, data visualization tools all in one place!
1. Matplotlib
Matplotlib is a data visualization library and 2-D plotting library of Python It was initially released in 2003 and
it is the most popular and widely-used plotting library in the Python community. It comes with an interactive
environment across multiple platforms. Matplotlib can be used in Python scripts, the Python and IPython shells,
the Jupyter notebook, web application servers, etc. It can be used to embed plots into applications using
various GUI toolkits like Tkinter, GTK+, wxPython, Qt, etc. So you can use Matplotlib to create plots, bar
charts, pie charts, histograms, scatterplots, error charts, power spectra, stemplots, and whatever other
visualization charts you want! The Pyplot module also provides a MATLAB-like interface that is just as
versatile and useful as MATLAB while being free and open source.
2. Plotly
Plotly is a free open-source graphing library that can be used to form data visualizations. Plotly (plotly.py) is
built on top of the Plotly JavaScript library (plotly.js) and can be used to create web-based data visualizations
that can be displayed in Jupyter notebooks or web applications using Dash or saved as individual HTML files.
Plotly provides more than 40 unique chart types like scatter plots, histograms, line charts, bar charts, pie charts,
error bars, box plots, multiple axes, sparklines, dendrograms, 3-D charts, etc. Plotly also provides contour plots,
which are not that common in other data visualization libraries. In addition to all this, Plotly can be used offline
with no internet connection.
3. Seaborn
Seaborn is a Python data visualization library that is based on Matplotlib and closely integrated with the NumPy
and pandas data structures. Seaborn has various dataset-oriented plotting functions that operate on data frames
and arrays that have whole datasets within them. Then it internally performs the necessary statistical
aggregation and mapping functions to create informative plots that the user desires. It is a high-level interface
for creating beautiful and informative statistical graphics that are integral to exploring and understanding data.
The Seaborn data graphics can include bar charts, pie charts, histograms, scatterplots, error charts, etc. Seaborn
also has various tools for choosing color palettes that can reveal patterns in the data.
4. GGplot
Ggplot is a Python data visualization library that is based on the implementation of ggplot2 which is created
for the programming language R. Ggplot can create data visualizations such as bar charts, pie charts,
histograms, scatterplots, error charts, etc. using high-level API. It also allows you to add different types of data
visualization components or layers in a single visualization. Once ggplot has been told which variables to map
to which aesthetics in the plot, it does the rest of the work so that the user can focus on interpreting the
visualizations and take less time in creating them. But this also means that it is not possible to create highly
customized graphics in ggplot. Ggplot is also deeply connected with pandas so it is best to keep the data in
DataFrames.
5. Altair
Altair is a statistical data visualization library in Python. It is based on Vega and Vega-Lite which are a sort of
declarative language for creating, saving, and sharing data visualization designs that are also interactive. Altair
can be used to create beautiful data visualizations of plots such as bar charts, pie charts, histograms, scatterplots,
error charts, power spectra, stemplots, etc. using a minimal amount of coding. Altair has dependencies which
include python 3.6, entrypoints, jsonschema, NumPy, Pandas, and Toolz which are automatically installed with
the Altair installation commands. You can open Jupyter Notebook or JupyterLab and execute any of the code
to obtain that data visualizations in Altair. Currently, the source for Altair is available on GitHub.
6. Bokeh
Bokeh is a data visualization library that provides detailed graphics with a high level of interactivity across
various datasets, whether they are large or small. Bokeh is based on The Grammar of Graphics like ggplot but
it is native to Python while ggplot is based on ggplot2 from R. Data visualization experts can create various
interactive plots for modern web browsers using bokeh which can be used in interactive web applications,
HTML documents, or JSON objects. Bokeh has 3 levels that can be used for creating visualizations. The first
level focuses only on creating the data plots quickly, the second level controls the basic building blocks of the
plot while the third level provides full autonomy for creating the charts with no pre-set defaults. This level is
suited to the data analysts and IT professionals that are well versed in the technical side of creating data
visualizations.
7. Pygal
Pygal is a Python data visualization library that is made for creating sexy charts! (According to their website!)
While Pygal is similar to Plotly or Bokeh in that it creates data visualization charts that can be embedded into
web pages and accessed using a web browser, a primary difference is that it can output charts in the form of
SVG’s or Scalable Vector Graphics. These SVG’s ensure that you can observe your charts clearly without
losing any of the quality even if you scale them. However, SVG’s are only useful with smaller datasets as too
many data points are difficult to render and the charts can become sluggish.
8. Geoplotlib
Most of the data visualization libraries don’t provide much support for creating maps or using geographical
data and that is why geoplotlib is such an important Python library. It supports the creation of geographical
maps in particular with many different types of maps available such as dot-density maps, choropleths, symbol
maps, etc. One thing to keep in mind is that requires NumPy and pyglet as prerequisites before installation but
that is not a big disadvantage. Especially since you want to create geographical maps and geoplotlib is the only
excellent option for maps out there!
In conclusion, all these Python Libraries for Data Visualization are great options for creating beautiful and
informative data visualizations. Each of these has its strong points and advantages so you can selec t the one
that is perfect for your data visualization or project. For example, Matplotlib is extremely popular and well
suited to general 2-D plots while Geoplotlib is uniquely suite to geographical visualizations. So go on and
choose your library to create a stunning visualization in Python!
Is Matplotlib Included in Python Matplotlib is not a part of the Standard Libraries which is installed by
default when Python, there are several toolkits which are available that extend python matplotlib functionality.
Some of them are separate downloads, others can be shipped with the matplotlib source code but have external
dependencies.
Python Matplotlib : Types of Plots
There are various plots which can be created using python matplotlib. Some of them are listed below:
from matplotlib import pyplot as plt
plt.plot([1,2,3],[4,5,1]) #Plotting to our canvas
plt.show() #Showing what we plotted
So, with three lines of code, you can generate a basic graph using python matplotlib. Simple, isn’t it?
Let us see how can we add title, labels to our graph created by python matplotlib library to bring in more meaning
to it. Consider the below example:
In the above plot, I have displayed the comparison between the distance covered by two cars BMW and Audi
over a period of 5 days. Next, let us move on to another kind of plot using python matplotlib – Histogram.
As you can see in the above plot, we got age groups with respect to the bins. Our biggest age group is between
40 and 50.
Output :
As you can see in the above graph, I have plotted two scatter plots based on the inputs specified in the above
code. The data is displayed as a collection of points having ‘high income low salary’ and ‘low income high
salary’.
Area plots are pretty much similar to the line plot. They are also known as stack plots. These plots can be used to
track changes over time for two or more related groups that make up one whole category. For example, let’s
compile the work done during a day into categories, say sleeping, eating, working and playing. Consider the
below code:
Output –
As we can see in the above image, we have time spent based on the categories. Therefore, area plot or stack plot
is used to show trends over time, among different attributes. Next, let us move to our last yet most frequently
used plot – Pie chart.
In the above pie chart, I have divided the circle into 4 sectors or slices which represents the respective category
(playing, sleeping, eating and working) along with the percentage they hold. Now, if you have noticed these
slices adds up to 24 hrs, but the calculation of pie slices is done automatically for you. In this way, pie charts are
really useful as you don’t have to be the one who calculates the percentage or the slice of the pie.
Next in python matplotlib, let’s understand how to work with multiple plots.
I have discussed about multiple types of plots in python matplotlib such as bar plot, scatter plot, pie plot, area
plot etc. Now, let me show you how to handle multiple plots. For this, I have to import numpy module which I
discussed in my previous blog on Python Numpy. Let me implement it practically, consider the below example.
import numpy as np
import matplotlib.pyplot as plt
def f(t):
return np.exp(-t) * np.cos(2*np.pi*t)
t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)
plt.subplot(221)
plt.plot(t1, f(t1), 'bo', t2, f(t2))
plt.subplot(222)
plt.plot(t2, np.cos(2*np.pi*t2))
plt.show()
The code is pretty much similar to the previous examples that you have seen but there is one new concept here
i.e. subplot. The subplot() command specifies numrow, numcol, fignum which ranges from 1 to
numrows*numcols. The commas in this command are optional if numrows*numcols<10. So subplot (221) is
identical to subplot (2,2,1). Therefore, subplots helps us to plot multiple graphs in which you can define it by
aligning vertically or horizontally. In the above example, I have aligned it horizontally.
Apart from these, python matplotlib has some disadvantages. Some of them are listed below:
Ans : Seaborn is a Python data visualization library based on the Matplotlib library. It provides a high-level
interface for drawing attractive and informative statistical graphs. Here in this article, we’ll learn how to create
basic plots using the Seaborn library. Such as:
▪ Scatter Plot
▪ Histogram
▪ Bar Plot
▪ Box and Whiskers Plot
▪ Pairwise Plots
Scatter Plot:
Scatter plots can be used to show a linear relationship between two or three data points using the seaborn library.
A Scatter plot of price vs age with default arguments will be like this:
plt.style.use("ggplot")
plt.figure(figsize=(8,6))
sns.regplot(x = cars_data["Age"], y = cars_data["Price"])
plt.show()
Here, regplot means Regression Plot. By default fit_reg = True. It estimates and plots a regression model
relating the x and y variable.
Histogram:
In order to draw a histogram in Seaborn, we have a function called distplot and inside that, we have to pass the
variable which we want to include. Histogram with default kernel density estimate:
plt.figure(figsize=(8,6))
sns.distplot(cars_data['Age'])
plt.show()
For the x-axis, we are giving Age and the histogram is by default include kernel density estimate (kde). Kernel
density estimate is the curved line along with the bins or the edges of the frequency of the Ages. If you want to
remove the Kernel density estimate (kde) then use kde = False.
plt.figure(figsize=(8,6))
sns.distplot(cars_data['Age'],kde=False)
plt.show()
After that, you got frequency as the y-axis and the age of the car as the x-axis. If you want to organize all the
different intervals or bins, you can use the bins parameter on the distplot function. Let’s use bins = 5 on
the distplot function. It will organize your bins into five bins or intervals.
plt.figure(figsize=(8,6))
sns.distplot(cars_data['Age'],kde=False,bins=5)
plt.show()
Now you can say that from age 65 to 80 we have more than 500 cars.
Bar Plot:
Bar plot is for categorical variables. Bar plot is the commonly used plot because of its simplicity and it’s easy to
understand data through them. You can plot a barplot in seaborn using the countplot library. It’s really simple.
Let’s plot a barplot of FuelType.
plt.figure(figsize=(8,6))
sns.countplot(x="FuelType", data=cars_data)
plt.show()
In the y-axis, we have got the frequency distribution of FuelType of the cars.
plt.figure(figsize=(8,6))
sns.countplot(x="FuelType", data=cars_data,
hue="Automatic")
plt.show()
05 : With a snippet of code demonstrate the univariate (box plot) nature of visualization.
Since the box plot is for continuous variables, firstly create a data frame without the column ‘variety’. Then drop
the column from the DataFrame using the drop( ) function and specify axis=1 to indicate it.
In matplotlib, mention the labels separately to display it in the output.
06 : With a snippet of code demonstrate the bivariate (scatter plot) nature of visualization
This plots different observations/values of the same variable corresponding to the index/observation number.
Consider plotting of the variable ‘sepal length(cm)’ :
Use the plt.scatter() function of matplotlib to plot a univariate scatter diagram. The scatter() function requires
two parameters to plot. So, in this example, we plot the variable ‘sepal.width’ against the corresponding
observation number that is stored as the index of the data frame (df.index).
Then visualize the same plot by considering its variety using the sns.scatterplot() function of the seaborn library.
One of the interesting features in seaborn is the ‘hue’ parameter. In seaborn, the hue parameter determines which
column in the data frame should be used for color encoding. This helps to differentiate between the data values
according to the categories they belong to. The hue parameter takes the grouping variable as it’s input using
which it will produce points with different colors. The variable passed onto ‘hue’ can be either categorical or
numeric, although color mapping will behave differently in the latter case.
Note:Every function has got a wide variety of parameters to play with to produce better results. If one is using
Jupyter notebook, the various parameters of the function used can be explored by using the ‘Shift+Tab’ shortcut.
Time series is a sequence of observations recorded at regular time intervals. Depending on the frequency of
observations, a time series may typically be hourly, daily, weekly, monthly, quarterly and annual.
Sometimes, you might have seconds and minute-wise time series as well, like, number of clicks and user
visits every minute etc. Why even analyze a time series? Because it is the preparatory step before you
develop a forecast of the series. Besides, time series forecasting has enormous commercial significance
because stuff that is important to a business like demand and sales, number of visitors to a website, stock
price etc are essentially time series data. So what does analyzing a time series involve? Time serie s analysis
involves understanding various aspects about the inherent nature of the series so that you are better informed
to create meaningful and accurate forecasts.
How to import time series in python?
So how to import time series data? The data for a time series typically stores in .csv files or other spreadsheet
formats and contains two columns: the date and the measured value. Let’s use the read_csv() in pandas
package to read the time series dataset (a csv file on Australian Drug Sales) as a pandas dataframe. Adding
the parse_dates=['date'] argument will make the date column to be parsed as a date field.
ser = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
ser.head()
Series Timeseries
Note, in the series, the ‘value’ column is placed higher than date to imply that it is a series.
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'],
index_col='date')
# Draw Plot
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from 1992 to 2008.')
Visualizing Time Series
Patterns in a time series
Any time series may be split into the following components: Base Level + Trend + Seasonality + Error A
trend is observed when there is an increasing or decreasing slope obs erved in the time series. Whereas
seasonality is observed when there is a distinct repeated pattern observed between regular intervals due to
seasonal factors. It could be because of the month of the year, the day of the month, weekdays or even time
of the day. However, It is not mandatory that all time series must have a trend and/or seasonality. A time
series may not have a distinct trend but have a seasonality. The opposite can also be true. So, a time series
may be imagined as a combination of the trend, seasonality and the error terms.
Another aspect to consider is the cyclic behaviour. It happens when the rise and fall pattern in the series
does not happen in fixed calendar-based intervals. Care should be taken to not confuse ‘cyclic’ effect with
‘seasonal’ effect. So, How to diffentiate between a ‘cyclic’ vs ‘seasonal’ pattern? If the patterns are not of
fixed calendar based frequencies, then it is cyclic. Because, unlike the seasonality, cyclic effects are
typically influenced by the business and other socio-economic factors.