Org It (TCS-iQMS-166) Python Programming Standards and Guidelines PDF
Org It (TCS-iQMS-166) Python Programming Standards and Guidelines PDF
Org It (TCS-iQMS-166) Python Programming Standards and Guidelines PDF
Version 1.0
Notice
© 2018 Tata Consultancy Services
This is a controlled document. Unauthorised access, copying, replication or
usage for a purpose other than for which it is intended, are prohibited.
TCS Internal ii
Python Programming Standards and Guidelines v1.0
TCS Internal iv
Python Programming Standards and Guidelines
Intended Audience
This document is intended for Python programmers and Reviewers who develop and review
Python code and deploy them in Production environment.
Chapter Description
Chapter 1 Introduction
Chapter 2 Python Identifier and Naming Standards
Chapter 3 Syntax
Chapter 4 Comments
Chapter 5 Imports
Chapter 6 Packages
Chapter 11 Generators
Chapter 14 Exceptions
Chapter 17 Idioms
Chapter 18 Conventions
Chapter Description
Chapter 22 Threading
Chapter 24 Packages
iQMS Document
This document is a part of the Integrated Quality Management System (iQMS) of TCS.
TCS Internal 6
Python Programming Standards and Guidelines
Abbreviation/ Description
Acronym
TCS Internal 7
Python Programming Standards and Guidelines
Contents
INTRODUCTION ......................................................................................................... 10
SYNTAX ...................................................................................................................... 13
3.1 Spacing ................................................................................................................................. 13
3.2 Colon ..................................................................................................................................... 13
3.3 Indentation............................................................................................................................. 13
3.4 Line Length............................................................................................................................ 14
5. IMPORTS .......................................................................................................................... 17
6. PACKAGES: ...................................................................................................................... 18
TCS Internal 8
Python Programming Standards and Guidelines
18. CONVENTIONS................................................................................................................. 31
REFERENCE ......................................................................................................................... 43
GLOSSARY ........................................................................................................................... 44
TCS Internal 9
Python Programming Standards and Guidelines
INTRODUCTION
This document describes the coding standards, naming conventions and guidelines to be
followed when designing and implementing framework/ scripts using Python programming.
This document helps to ensure consistency across the code, resulting in increased usability
and maintainability of the code.
TCS Internal 10
Python Programming Standards and Guidelines
A Python identifier is a name used to identify a variable, function, class, module or other
object.
The file name should be meaningful and should end with “.py”.
# Correct
Help_script.py
Hello_world.py
# Incorrect
Abcd.py
No123.py
Examples:
TCS Internal 11
Python Programming Standards and Guidelines
The mangling rule of Python is adding the “_ClassName” to front of attribute names are
declared with double underscore.
That is, if you write method named “__method” in a class, the name will be mangled in
“_ClassName__method” form.
The following are the list of reserved words in Python. These words may not be used as any
identifier name.
The above keywords may get altered in different versions of python. Some extra might get
added or some might be removed. We can always get the list of keywords in current version
by typing the following in prompt.
TCS Internal 12
Python Programming Standards and Guidelines
SYNTAX
3.1 Spacing
Tabs or 4 spaces should be used solely to remain consistent with code that is already
indented with tabs.
Python 3 disallows mixing the use of tabs and spaces for indentation.
Python 2 code indented with a mixture of tabs and spaces should be converted to using
spaces exclusively.
When invoking the Python 2 command line interpreter with the -t option, it issues warnings
about code that illegally mixes tabs and spaces. When using –tt option, these warnings
become errors. These options are highly recommended.
3.2 Colon
Colon “ : “ using for Function, Loops, Conditions, Class defining as like curly braces in other
programming language.
def subtraction():
a…
b…
3.3 Indentation
Use four (4) spaces to indent the code
Never use tabs or mix of tabs and spaces
Fixing line length (80 Columns) prevents lots of nesting and very long functions
It is suggested to have Indents of 4 spaces at minimum; though 8 Spaces are ideal
TCS Internal 13
Python Programming Standards and Guidelines
TCS Internal 14
Python Programming Standards and Guidelines
COMMENT STANDARDS
Comments should be added to increase the readability of code. Comments that contradict
the code are worse than no comments. Always make a priority of keeping the comments up-
to-date when the code changes.
If a comment is short, the period at the end can be omitted. Block comments generally
consist of one or more paragraphs built out of complete sentences and each sentence
should end with a period.
An inline comment is a comment on the same line as a statement. Inline comments should
be separated by at least two spaces from the statement. They should start with a # and a
single space.
Inline comments are unnecessary and in fact distracting if they state the obvious.
x = x + 1 # Increment x
TCS Internal 15
Python Programming Standards and Guidelines
Write docstrings for all public modules, functions, classes, and methods. Docstrings are not
necessary for non-public methods, but you should have a comment that describes what the
method does. This comment should appear after the def line.
Most importantly, the """ that ends a multiline docstring should be on a line by itself, e.g.:
"""Return a foobang
Optional plotz says to frobnicate the bizbaz first.
"""
For one liner docstrings, keep the closing """ on the same line.
"""Return a foobang Optional plotz says to frobnicate the bizbaz first. """
TCS Internal 16
Python Programming Standards and Guidelines
5. IMPORTS
The imports are used only for packages and modules. The imports allows the reusability
mechanism for sharing the code across modules.
Example. If you want to import package urllib2 used for reading the data from urls we write it
as
import urllib2
TCS Internal 17
Python Programming Standards and Guidelines
6. PACKAGES:
The code base can be divided into clean and efficient modules using python packages. The
packages can be reused for sharing the code between different python programs. Like a
directory contains sub directories and files, a python package can contain sub packages and
modules.
To import the module, we need to use the full path name of the module. The advantage of
specifying full path name avoids conflicts in module names and makes it easier to find the
modules. It makes harder to deploy code because you have to replicate package hierarchy.
from sound.filters import equalizer (reference code with just module name
and is preferred)
TCS Internal 18
Python Programming Standards and Guidelines
The advantage is that they allow definition of utility classes and functions that are only used
for a very limited scope. The disadvantages are nested or local classes cannot be pickled.
class Human:
def __init__(self):
self.name = 'Guido'
self.head = self.Head()
class Head:
def talk(self):
return 'talking...'
if __name__ == '__main__':
guido = Human()
print guido.name
print guido.head.talk()
TCS Internal 19
Python Programming Standards and Guidelines
8. Conditional Expressions
Conditional Expressions are acceptable for one liners. Conditional expressions are
mechanisms that provide a shorter syntax for if statements.
The advantage is that they are short and more convenient than if statements. The
disadvantage is that it may be hard to read and the condition may be difficult to locate if the
expression is long.
TCS Internal 20
Python Programming Standards and Guidelines
9. GLOBAL VARIABLES
Variables that are declared at module level are called global variables. Avoid use of global
variables. The advantage of using global variable is that they are occasionally useful. The
disadvantage is that it has potential to change the module behaviour during the import
because assignments to module-level variables are done when the module is imported. In
favour of class variables, avoid the global variables.
TCS Internal 21
Python Programming Standards and Guidelines
For example:
The advantage is that the Simple list comprehensions can be clearer and simpler than other
list creation techniques. Generator expressions can be very efficient, since they avoid the
creation of a list entirely. The disadvantage is that complicated list comprehensions or
generator expressions can be hard to read.
Each portion must fit on one line: mapping expression, for clause, filter expression.
Multiple for clauses or filter expressions are not permitted, it is recommended to use loops in
such cases.
TCS Internal 22
Python Programming Standards and Guidelines
11. GENERATORS
A generator function returns an iterator that yields a value each time it executes a yield
statement. After it yields a value, the runtime state of the generator function is suspended
until the next value is needed.
The advantage is that it results in simple code. The state of the local variables and control
flow are preserved for each call. A generator uses less memory than the function call that
creates an entire list of values at once.
def fib(max):
a, b = 0, 1
while a < max:
yield a
a, b = b, a + b
TCS Internal 23
Python Programming Standards and Guidelines
The disadvantage is that it is difficult to read and debug than local functions. The lack of
names means stack traces are more difficult to understand. Expressiveness is limited
because the function may only contain an expression.
If the code inside the lambda function is any longer than 60–80 characters, it's probably
better to define it as a regular (nested) function. For common operations like multiplication,
use the functions from the operator module instead of lambda functions.
double = lambda x: x * 2
print(double(5))
TCS Internal 24
Python Programming Standards and Guidelines
The advantage is that the default iterators and operators are simple and efficient. They
express the operation directly, without extra method calls. A function that uses default
operators is generic. It can be used with any type that supports the operation.
The type of object can not tell by reading the method names (e.g. has_key() means a
dictionary). This is also an advantage.
Use default iterators and operators for types that support them, like lists, dictionaries, and
files. The built-in types define iterator methods, too. Prefer these methods to methods that
return lists, except that you should not mutate a container while iterating over it.
TCS Internal 25
Python Programming Standards and Guidelines
14. EXCEPTIONS
Exceptions are allowed but must be used carefully. Exceptions are events that can modify
the flow of control through a program.
Exceptions are a means of breaking out of the normal flow of control of a code block to
handle errors or other exceptional conditions.
IOError
If the file cannot be opened
ImportError
If python cannot find the module
ValueError
Raised when a built-in operation or function receives an argument that has the right type but
an inappropriate value
KeyboardInterrupt
Raised when the user hits the interrupt key (normally Control-C or Delete)
EOFError
Raised when one of the built-in functions (input() or raw_input()) hits an end-of-file condition
(EOF) without reading any data
try:
fh = open("testfile", "w")
fh.write("This is my test file for exception handling!!")
except IOError:
print "Error: Can't find file or read data"
else:
print "Written content in the file successfully"
fh.close()
try:
print 1/0
except ZeroDivisionError:
print "You can't divide by zero."
The advantage is that the control flow of normal operation code is not cluttered by error-
handling code. It also allows the control flow to skip multiple frames when a certain condition
occurs, e.g., returning from N nested functions in one step instead of having to carry-through
error codes.
The disadvantage is that it may cause the control flow to be confusing. It is easy to miss
error cases when making library calls.
TCS Internal 26
Python Programming Standards and Guidelines
Do not use the two argument form (raise MyException, 'Error Message') or deprecated
string based exception (raise 'Error message')
2. Modules or packages should define their own domain specific base exception class,
which should inherit from the built-in Exception class. The base Exception for a
module should be called Error
class Error(Exception):
pass
4. Minimize the amount of code in a try/ except block. The larger the body of the try, the
more likely that an exception will be raised by a line of code that you didn't expect to
raise an exception. In those cases, the try/ except block hides a real error.
5. Use the finally clause to execute code whether or not an exception is raised in the try
block. This is often useful for clean-up like closing a file.
6. When capturing an exception, use “as” rather than a comma. For example:
try:
raise Error
except Error as error:
pass
TCS Internal 27
Python Programming Standards and Guidelines
You can specify values for variables at the end of a function's parameter list,
e.g. def foo(a,b=0)
If foo is called with only one argument, then b is set to zero. If foo is called with two
arguments, then b has the value of the second argument.
The advantage is that often you have a function that uses lots of default values, but rarely
you want to override the defaults. Default argument values provide an easy way to do this,
without having to define lots of functions for the rare exceptions. Also, Python does not
support overloaded methods/ functions. Default arguments are an easy way of "faking" the
overloading behaviour.
The disadvantage is that default arguments are evaluated once at a module load time. This
may cause problems if the argument is a mutable object such as a list or a dictionary. If the
function modifies the object (like by appending an item to a list), the default value is
modified.
Do not use mutable objects as default values in the function or method definition.
TCS Internal 28
Python Programming Standards and Guidelines
TCS Internal 29
Python Programming Standards and Guidelines
17. IDIOMS
Al Programming idiom is a way to write code
Unpacking: If you know the length of a list or tuple, you can assign names to its elements
with unpacking. For example, since enumerate() will provide a tuple of two elements for each
item in list:
for index, item in enumerate(some_list): # do something with index and item
filename = 'foobar.txt'
basename, __, ext = filename.rpartition('.')
Many Python style guides recommend the use of a single underscore “_” for throwaway
variables rather than the double underscore “__” . The issue is that “_” is commonly used as
an alias for the gettext() function and is also used at the interactive prompt to hold the value
of the last operation. Using a double underscore instead is just as clear and almost as
convenient and eliminates the risk of accidentally interfering with either of these other use
cases.
def lookup_set(s):
return 's' in s
def lookup_list(l):
return 's' in l
TCS Internal 30
Python Programming Standards and Guidelines
18. CONVENTIONS
Check if variable equals a constant: You don’t need to explicitly compare a value to True or
None or 0. You can just add it to the if statement. See Truth Value Testing for a list which is
considered as false.
Bad:
if attr == True:
print 'True!'
if attr == None:
print 'attr is None!'
Good:
Don’t use the dict.has_key() method. Instead, use x in d syntax or pass a default argument
to dict.get().
Bad:
d = {'hello': 'world'}
if d.has_key('hello'):
print d['hello'] # prints 'world'
else:
print 'default_value'
Good:
d = {'hello': 'world'}
# Or:
if 'hello' in d:
print d['hello']
TCS Internal 31
Python Programming Standards and Guidelines
List comprehensions provide a powerful and concise way to work with lists. Also, the map()
and filter() functions can perform operations on lists using a different and more concise
syntax.
Bad:
Good:
a = [3, 4, 5]
b = [i for i in a if i > 4]
# Or:
b = filter(lambda x: x > 4, a)
Bad:
Good:
a = [3, 4, 5]
a = [i + 3 for i in a]
# Or:
a = map(lambda i: i + 3, a)
a = [3, 4, 5]
for i, item in enumerate(a):
print i, item
# prints
#03
#14
#25
The enumerate() function has better readability than handling a counter manually. Moreover,
it is better optimized for iterators.
TCS Internal 32
Python Programming Standards and Guidelines
Use the “with open” syntax to read from files. This will automatically close files for you.
Bad:
f = open('file.txt')
a = f.read()
print a
f.close()
Good:
with open('file.txt') as f:
for line in f:
print line
The with statement is better because it will ensure you always close the file, even if an
exception is raised inside the “with” block.
Line Continuations:
When a logical line of code is longer than the accepted limit, you need to split it over multiple
physical lines. The Python interpreter will join consecutive lines if the last character of the
line is a backslash. This is helpful in some cases, but should usually be avoided because of
its fragility. A white space added to the end of the line, after the backslash, will break the
code and may have unexpected results.
A better solution is to use parentheses around your elements. Left with an unclosed
parenthesis on an end-of-line the Python interpreter will join the next line until the
parentheses are closed. The same behaviour holds for curly and square braces.
Bad:
Good:
my_very_big_string = (
"For a long time I used to go to bed early. Sometimes, "
"when I had put out my candle, my eyes would close so quickly "
"that I had not even time to say “I’m going to sleep.”"
)
from some.deep.module.inside.a.module import (
a_nice_function, another_nice_function, yet_another_nice_function)
However, more often than not, having to split a long logical line is a sign that you are trying
to do too many things at the same time, which may hinder readability.
TCS Internal 33
Python Programming Standards and Guidelines
A nested Python function can refer to variables defined in enclosing functions, but cannot be
assigned to them. Variable bindings are resolved using lexical scoping that is based on the
static program text.
Any assignment to a name in a block will cause Python to treat all references to that name
as a local variable, even if the use precedes the assignment. If a global declaration occurs,
the name is treated as a global variable.
def get_adder(summand1):
"""Returns a function that adds numbers to a given number."""
def adder(summand2):
return summand1 + summand2
return adder
The advantage is that it often results in clearer and more elegant code.
i=4
def foo(x):
def bar():
print i,
# ...
# A bunch of code here
# ...
for i in x: # Ah, i *is* local to Foo, so this is what Bar sees
print i,
bar()
TCS Internal 34
Python Programming Standards and Guidelines
Python is extremely flexible language and gives you many fancy features such as
metaclasses, access to bytecode, on-the-fly compilation, dynamic inheritance, object
reparenting, import hacks, reflection, modification of system internals etc.
The advantage is that these are powerful language features which can make code very
compact.
The disadvantage is that it is tempting to use these features when they're not absolutely
necessary. It's harder to read, understand and debug code that's using these features. It
doesn't seem that way at first (to the original author), but when revisiting the code, it tends to
be more difficult than code that is longer but straight forward.
TCS Internal 35
Python Programming Standards and Guidelines
Current versions of Python provide alternative constructs that people find generally
preferable.
Good:
words = foo.split(':')
fn(*args, **kwargs)
Bad:
words = string.split(foo, ':')
TCS Internal 36
Python Programming Standards and Guidelines
22. THREADING
Do not rely on the atomicity of built-in types.
While Python's built-in data types such as dictionaries appear to have atomic operations.
There are corner cases where they aren't atomic (e.g. if __hash__or__eq__are implemented
as Python methods) and their atomicity should not be relied upon. Also you should not rely
on atomic variable assignment (since this in turn depends on dictionaries).
Use the Queue module's Queue data type as the preferred way to communicate data
between threads. Otherwise, use the threading module and its locking primitives. Learn
about the proper use of condition variables so you can use threading.Condition instead of
using lower-level locks.
def myfunc(i):
print "sleeping 5 sec from thread %d" % i
time.sleep(5)
print "finished sleeping from thread %d" % i
for i in range(10):
t = Thread(target=myfunc, args=(i,))
t.start()
TCS Internal 37
Python Programming Standards and Guidelines
The advantage is Conditions using Python Booleans are easier to read and less error-prone.
In most cases, they're also faster.
Never compare a Boolean variable to False using ==. Use if not x: instead. If you need to
distinguish False from None then chain the expressions, such as if not x and x is not None:.
For sequences (strings, lists, tuples), use the fact that empty sequences are false, so if not
seq: or if seq: is preferable to if len(seq): or if not len(seq):
When handling integers, implicit false may involve more risk than benefit (like accidentally
handling None as 0). You may compare a value which is known to be an integer (and is not
the result of len()) against the integer 0.
Good:
if not users:
print 'no users'
if foo == 0:
self.handle_zero()
if i % 10 == 0:
self.handle_multiple_of_ten()
Bad:
if len(users) == 0:
print 'no users'
TCS Internal 38
Python Programming Standards and Guidelines
24. PACKAGES
The below is the list of widely used Python packages and the list can go on.
1. NumPy:
The most fundamental package, around which the scientific computation stack is built, is
numpy (stands for Numerical Python). It provides an abundance of useful features for
operations on n-arrays and matrices in Python
Example:
import numpy as np
cvalues = [20.1, 20.8, 21.9, 22.5, 22.7, 22.3, 21.8, 21.2, 20.9, 20.1]
C = np.array(cvalues)
print(C)
print(C * 9 / 5 + 32)
2. SciPy:
SciPy is a library of software for engineering and science It provides efficient numerical
routines as numerical integration, optimization and many others via its specific sub-modules
Example:
from scipy import linalg
arr = np.array([[1, 2],[3, 4]])
linalg.det(arr)
3. Pandas:
Pandas is a Python package designed to do work with “labeled” and “relational” data simple
and intuitive. Pandas is a perfect tool for data wrangling.
Example:
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 4))
pieces = [df[:3], df[3:7], df[7:]]
pd.concat(pieces)
4. Matplotlib:
Another SciPy Stack core package and another Python Library that is tailored for the
generation of simple and powerful visualizations with ease is Matplotlib
Example:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(C)
plt.show()
TCS Internal 39
Python Programming Standards and Guidelines
5. Seaborn:
Seaborn is mostly focused on the visualization of statistical models; such visualizations
include heat maps, those that summarize the data but still depict the overall distributions
Example:
import seaborn as sns
sns.set(style="ticks")
6. Bokeh:
Another great visualization library is Bokeh, which is aimed at interactive visualizations. In
contrast to the previous library, this one is independent of Matplotlib. The main focus of
Bokeh, as we already mentioned, is interactivity and it makes its presentation via modern
browsers in the style of Data-Driven Documents (d3.js).
Example
#Import library
from bokeh.charts import Bar, output_file, show #use output_notebook to visualize it in notebook
# prepare data (dummy data)
data = {"y": [1, 2, 3, 4, 5]}
# Output to Line.HTML
output_file("lines.html", title="line plot example") #put output_notebook() for notebook
# create a new line chat with a title and axis labels
p = Bar(data, title="Line Chart Example", xlabel='x', ylabel='values', width=400, height=400)
# show the results
show(p)
TCS Internal 40
Python Programming Standards and Guidelines
7. Plotly
It is rather a web-based toolbox for building visualizations, exposing APIs to some
programming languages (Python among them)
Example:
import plotly.plotly as py
from plotly.graph_objs import *
trace0 = Scatter(
x=[1, 2, 3, 4],
y=[10, 15, 13, 17]
)
trace1 = Scatter(
x=[1, 2, 3, 4],
y=[16, 5, 11, 9]
)
data = Data([trace0, trace1])
8. sciKit-Learn
Scikits are additional packages of SciPy Stack designed for specific functionalities like image
processing and machine learning facilitation
Example: The below page for package sciKit-learn will provide various example for sciKit-
learn.
http://scikit-learn.org/
9. Theano:
Theano is a Python package that defines multi-dimensional arrays similar to NumPy, along
with math operations and expressions. The library is compiled, making it run efficiently on all
architectures. Originally developed by the Machine Learning group of Université de
Montréal, it is primarily used for the needs of Machine Learning
Example: The below page for package Theano will provide various example for theano.
http://deeplearning.net/software/theano/tutorial/index.html
10. TensorFlow:
Coming from developers at Google, it is an open-source library of data flow graphs
computations, which are sharpened for Machine Learning. It was designed to meet the high-
demand requirements of Google environment for training Neural Networks and is a
successor of DistBelief, a Machine Learning system, based on Neural Networks. However,
TensorFlow isn’t strictly for scientific use in borders of Google — it is general enough to use it
in a variety of real-world application.
Example: The below page for package Tensorflow will provide various example for
tensorflow
https://www.tensorflow.org/
11. Keras:
It is an open-source library for building Neural Networks at a high-level of the interface, and it
is written in Python. It is minimalistic and straightforward with high-level of extensibility. It
TCS Internal 41
Python Programming Standards and Guidelines
uses Theano or TensorFlow as its backends, but Microsoft makes its efforts now to integrate
CNTK (Microsoft’s Cognitive Toolkit) as a new back-end.
Example: The below page for package keras will provide various example for keras
https://keras.io/
12. NLTK:
The name of this suite of libraries stands for Natural Language Toolkit and, as the name
implies, it used for common tasks of symbolic and statistical Natural Language Processing.
NLTK was intended to facilitate teaching and research of NLP and the related fields
(Linguistics, Cognitive Science Artificial Intelligence, etc.) and it is being used with a focus on
this today.
Example: The below page for package NLTK will provide various example for NLTK
http://www.nltk.org/
13. Gensim:
It is an open-source library for Python that implements tools for work with vector space
modeling and topic modeling. The library designed to be efficient with large texts, not only in-
memory processing is possible. The efficiency is achieved by the using of NumPy data
structures and SciPy operations extensively. It is both efficient and easy to use
Example: The below package web page gives the gensim example
https://radimrehurek.com/gensim/tutorial.html
14. Scrapy:
Scrapy is a library for making crawling programs, also known as spider bots, for retrieval of
the structured data, such as contact info or URLs, from the web
Example: The below webpage gives python scrapy introduction and example
https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/
15. Statsmodels:
As you have probably guessed from the name, statsmodels is a library for Python that
enables its users to conduct data exploration via the use of various methods of estimation of
statistical models and performing statistical assertions and analysis
Example:
import statsmodels.api as sm
import statsmodels.formula.api as smf
linreg = smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df).fit()
TCS Internal 42
Python Programming Standards and Guidelines
Reference
For more information, please refer to the following sources:
TCS Internal 43
Python Programming Standards and Guidelines
Glossary
For the glossary of common iQMS terms, refer to the document Glossary of Common Terms
(TCS-iQMS-999).
TCS Internal 44
Python Programming Standards and Guidelines
FEEDBACK FORM
Date:
Location:
From:
________________________________________________________________________
Feedback details (Comments / Suggestions / Corrections / Queries)
(For associates to share their feedback, a feedback link is also available in iQMS Wiki)
TCS Internal 45
END OF DOCUMENT