13.file Handling

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 66

Course Id :INT 213

FILE HANDLING
I INTRODUCTION
FILES
• Data is very important. Every organization depends on its data for
continuing its business operations. If the data is lost, the organization
has to be closed.
• This is the reason computers are primarily created for handling data,
especially for storing and retrieving data. In later days, programs are
developed to process the data that is stored in the computer.
FILES
• To store data in a computer, we need files. For example, we can
store employee data like employee number, name and salary in a file
in the computer and later use it whenever we want.
• Similarly, we can store student data like student roll number, name
and marks in the computer. In computers’ view, a file is nothing but
collection of data that is available to a program. Once we store data
in a computer file, we can retrieve it and use it depending on our
requirements.


ADVANTAGES OF STORING A DATA IN A FILE
• When the data is stored in a file, it is stored permanently. This means
that even though the computer is switched off, the data is not
removed from the memory since the file is stored on hard disk or CD.
This file data can be utilized later, whenever required.
• It is possible to update the file data. For example, we can add new
data to the existing file, delete unnecessary data from the file and
modify the available data of the file. This makes the file more useful.
ADVANTAGES OF STORING A DATA IN A FILE
• Once the data is stored in a file, the same data can be shared by
various programs. For example, once employee data is stored in a
file, it can be used in a program to calculate employees’ net salaries
or in another program to calculate income tax payable by the
employees.
• Files are highly useful to store huge amount of data. For example,
voters’ list or census data.
II TYPES OF FILES
TYPES OF FILES
• In Python, there are two types of files.
• They are:
 Text files
 Binary files
• Text files store the data in the form of characters. For example, if we
store employee name “Ganesh”, it will be stored as 6 characters and the
employee salary 8900.75 is stored as 7 characters.
• Text files are used to store characters or strings.
TYPES OF FILES
• Binary files store entire data in the form of bytes, i.e. a group of 8 bits
each. For example, a character is stored as a byte and an integer is
stored in the form of 8 bytes (on a 64 bit machine). When the data is
retrieved from the binary file, the programmer can retrieve the data
as bytes.
• Binary files can be used to store text, images, audio and video.
Image files are generally available in .jpg, .gif or .png formats.
• We cannot use text files to store images as the images do not
contain characters.
TYPES OF FILES
• On the other hand, images contain pixels which are minute dots with
which the picture is composed of.
• Each pixel can be represented by a bit, i.e. either 1 or 0. Since these
bits can be handled by binary files, we can say that they are highly
suitable to store images. It is very important to know how to create
files, store data in the files and retrieve the data from the files in
Python
III OPEN A FILE
OPENING A FILE

• file = open("file1.txt","rb")
print(file)
MODES OF FILES
IV WRITING A DATA INTO A FILE
(a) write() method
• The write method is used to write f = open('file1.txt', 'w')
a string to an already opened file. #enter characters from keyboard
• String may include members, str = input('Enter text:')
special characters, other symbols.
#write the string into file
f.write(str)
#closing the file
f.close()
writelines() method
• writelines() method is used • f = open('file1.txt', 'w’)
to write a list of strings. lines=["hello world,","welcome to the world
of python"]
f.writelines(lines)
#closing the file
f.close()
print("data written to file")
V APPEND DATA TO FILE
append() method
• To append a file, you must open it • f = open('file1.txt', 'a’)
using ‘a’ or ‘ab’ mode depending f.write("\n my name is dev")
on whether it is a text or a binary
file. #closing the file
f.close()
print("data written to file")
V READ , READLINE, READLINES METHODS
(a) read() method
• This method is used to read a • f = open('file1.txt', 'r’)
string from an already opened print(f.read(10))
file.
f.close()
(b) readline() method
• This method is used to read a • f = open('file1.txt', 'r’)
single line from the file. print(f.readline())
print(f.readline())
print(f.readline())
f.close()
(c) readlines() method
• This method is used to read all • f = open('file1.txt', 'r’)
the lines in a file. print(f.readlines())
f.close()
VI DISPLAY THE CONTENTS OF A FILE USING LOOP
Display the contents of a file using FOR loop
• f = open('file1.txt', 'r’)
for line in f:
print(line)
f.close()
VII OPENING A FILE USING with KEYWORD
With Keyword
• with open ("file1.txt","rb") as file: • with open ("file1.txt","r") as file:
for line in file: for line in file:
print(line) print(line)
file.close() file.close()
VIII SPLITTING WORDS
split() function
• This function is used to split the • with open ("file1.txt","b") as file:
strings into words. line= file.readline()
words= line.split()
print(words)
IX EXCERCISE
1. What is the output of the Code?
Program 1
• Write a program that accepts filename as an input from the user.
Open the file and count the number of times a character appears in
the file
Program 2
• Write a program that reads data from a file and calculates the
percentage of vowels and consonants
Program 3
• Write a program to count number of lines, words, characters in a
text file
X RENAMING AND DELETING
rename() method
• Rename () method takes two • Import os
arguments, the current filename os.rename(“fileo.txt”,”filen.txt”)
and the new filename.
print(“file renamed”)
remove() method
• Remove() method is used to • Import os
delete file. os.remove(“file1.txt”)
print(“file deleted”)
XI PICKLE
PICKLE
• Text files are useful when we do not want to perform any
calculations on the data. What happens if we want to store some
structured data in the files? For example, we want to store some
employee details like employee identification number (int type),
name (string type) and salary (float type) in a file. This data is well
structured and got different types. To store such data, we need to
create a class Employee with the instance variables id, name and sal
as shown in next slide.
PICKLE
Pickle
• In the previous program, we create an object to class and store
actual data into that object. Later, this object should be stored into a
binary file in the form of bytes. This is called pickle or serialization.
• So, let’s understand that pickle is a process of converting a class
object into a byte stream so that it can be stored into a file. This is
also called object serialization.
Pickle
• Pickling is done using the dump() method of ‘pickle’ module as:
pickle.dump(object, file)
• The preceding statement stores the ‘object’ into the binary ‘file’.
Once the objects are stored into a file, we can read them from the
file at any time.
Pickle Implementation
Unpickle
• Unpickle is a process whereby a byte stream is converted back into a
class object. It means, unpickling represents reading the class objects
from the file.
• Unpickling is also called desearialization.
• Unpickling is done using the load() method of ‘pickle’ module as:
object = pickle.load(file)
• Here, the load() method reads an object from a binary ‘file’ and returns
it into ‘object’. Let’s remember that pickling and unpickling should be
done using binary files since they support byte streams. The word
stream represents data flow. So, byte stream represents flow of bytes.
Unpickle Implementation
Picking and Unpickling a class object
XII seek() AND tell() method
tell() method
• We know that data in the binary files is stored in the form of bytes.
When we conduct reading or writing operations on a binary file, a file
pointer moves inside the file depending on how many bytes are
written or read from the file.
• For example, if we read 10 bytes of data from a file, the file pointer
will be positioned at the 10th byte so that it is possible to continue
reading from the 11th byte onwards.
• To know the position of the file pointer, we can use the tell()
method.

tell() method

• It returns the current position of the file pointer from the beginning
of the file. It is used in the form: n = f.tell()
• Here, ‘f’ represents file handler or file object. ‘n’ is an integer that
represents the byte position where the file pointer is positioned. In
case, we want to move the file pointer to another position, we can
use the seek() method.
seek () method

• This method takes two arguments: f.seek(offset, fromwhere)


• Here, ‘offset’ represents how many bytes to move. ‘fromwhere’
represents from which position to move.
• For example, ‘fromwhere’ can be 0, 1 or 2. Here, 0 represents from
the beginning of the file, 1 represents from the current position and
2 represents from the ending of the file. The default value of
‘fromwhere’ is 0, i.e. beginning of the file.
seek() method
• f.seek(10) #same as f.seek(10, 0)
• This will move the file pointer to the 11th byte (i.e. 10+1) from the
beginning of the file (0 represents beginning of the file). So, any
reading operation will read data from 11th byte onwards.
• f.seek(-10, 2)
• This will move the file pointer to the 9th byte (-10+1) from the ending
of the file (2 represents ending of the file). The negative sign before
10 represents moving back in the file.
XIII RANDOM ACCESSING OF BINARY FILES USING
mmap
mmap
• mmap – ‘memory mapped file’ is a module in Python that is useful to
map or link to a binary file and manipulate the data of the file as we
do with the strings.
• It means, once a binary file is created with some data, that data is
viewed as strings and can be manipulated using mmap module. The
first step to use the mmap module is to map the file using the
mmap() method as:
mm = mmap.mmap(f.fileno(), 0)
• This will map the currently opened file (i.e. ‘f’) with the file object
‘mm
• Please observe the arguments of mmap() method.
• The first argument is ‘f.fileno()’ .
• This indicates that fileno() is a handle to the file object ‘f’.
• This ‘f’ represents the actual binary file that is being mapped. The second
argument is zero( 0) represents the total size of the file should be
considered for mapping. So, the entire file represented by the file object
‘f’ is mapped in memory to the object ‘mm’. This means, ‘mm’ will now
onwards behave like the file ‘f’.
mmap
• Now, we can read the data from the file using read() or readline()
methods as:
print(mm.read()) #displays entire file data
print(mm.readline()) #displays the first line of the file
mmap
• We can retrieve data from the file using slicing operator as:
print(mm[5:]) #display from 5th byte till the end
print(mm[5:10]) #display from 5th to 9th bytes
• It is also possible to modify or replace the data of the file using slicing
as:
mm[5:10] = str #replace from 5th to 9th characters by string ‘str’
mmap
• We can also use find() method that returns the first position of a string in
the file as:
n = mm.find(name)
#return the position of name in the file
• We can also use seek() method to position the file pointer to any
position we want as:
mm.seek(10, 0)
#position the file pointer to 10th byte from beginning of file
XIV ZIPPING AND UNZIPPING OF FILES
ZIPPING AND UNZIPPING OF FILES
• We know that some softwares like ‘winzip’ provide zipping and
unzipping of file data.
• In zipping the file contents, following two things could happen:
 The file contents are compressed and hence the size will be reduced.
The format of data will be changed making it unreadable.
ZIPPING AND UNZIPPING OF FILES
• While zipping a file content, a zipping algorithm (logic) is used in such
a way that the algorithm first finds out which bit pattern is most
often repeated in the original file and replaces that bit pattern with a
0. Then the algorithm searches for the next bit pattern which is most
often repeated in the input file.
• In its place, a 1 is substituted. The third repeated bit pattern will be
replaced by 10, the fourth by 11, the fifth by 100, and so on. In this
way, the original bit patterns are replaced by lesser number of bits.
This file with lesser number of bits is called ‘zipped file’ or
‘compressed file’.
ZIPPING AND UNZIPPING OF FILES
• In Python, the module zipfile contains ZipFile class that helps us to zip
or unzip a file contents. For example, to zip the files, we should first
pass the zip file name in write mode with an attribute ZIP_DEFLATED
to the ZipFile class object as:
f = ZipFile('test.zip', 'w', ZIP_DEFLATED)
• Here, ‘f’ is the ZipFile class object to which test.zip file name is
passed. This is the zip file that is created finally. The next step is to
add the filenames that are to be zipped, using write() method as:
f.write('file1.txt’)
f.write('file2.txt')
Python program to compress the contents of files
extractall() method
• In the previous program, we assumed that the three files: file1.txt, file2.txt
and file3.txt are already available in the current directory where this
program is run.
• To unzip the contents of the compressed files and get back their original
contents, we can use ZipFile class object in read mode as:
z = ZipFile('test.zip', 'r’)
• Here, test.zip is the filename that contains the compressed files.
• To extract all the files from the zip file object ‘z’, we can use the
extractall() method as:
z.extractall()
Python program to unzip the contents of the files
that are available in a zip file

You might also like