Powerqueryguidetopandas Sample

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

PowerQuery Guide to Pandas

A Comparative Approach to Learn Pandas

Kenneth Infante
This book is for sale at http://leanpub.com/powerqueryguidetopandas

This version was published on 2020-02-21

This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.

© 2019 - 2020 Kenneth Infante


Tweet This Book!
Please help Kenneth Infante by spreading the word about this book on Twitter!
The suggested hashtag for this book is #PowerQueryGuideToPandas.
Find out what other people are saying about the book by clicking on this link to search for this
hashtag on Twitter:
#PowerQueryGuideToPandas
Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. About the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


2.1 Why did I write this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Who is the Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Conventions in the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3. The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Case Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Indentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.6 The Help System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.7 Coding Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4. Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18


4.1 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1 Built-in Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Field Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6. Control Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.1 Conditional Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7. Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1 If Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.2 Try Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

8. Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.1 Function Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
CONTENTS

8.2 Optional Parameters & Default Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26


8.3 Return Multiple Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.4 Type Hinting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

9. Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
9.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

10. Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10.2 Transforming a String Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10.3 String Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

11. Date and Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


11.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
11.2 Transforming a Datetime Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
11.3 Date Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

12. Structured Data Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


12.2 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
12.3 Records and Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
12.4 Creating a Table/Dataframe from List, Records or Dictionaries . . . . . . . . . . . . . . 34

13. Introduction to Data Wrangling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35


13.2 Rename Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
13.3 Reorder Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
13.4 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
13.5 Unique Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
13.6 Dealing with Duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
13.7 Getting Info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

14. Adding Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38


14.1 Duplicating a Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
14.2 Extracting Values from a Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
14.3 Conditional Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
14.4 Index Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
14.5 Thru a Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

15. Subsetting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40


15.1 Selecting Columns and Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
15.2 loc and iloc (Pandas only) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
15.3 Removing Columns and Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

16. Dealing with Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


16.1 Fill Forward and Fill Backward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
16.2 Recode/Replace Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
CONTENTS

16.3 Dropping Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

17. Grouping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43


17.1 Aggregate a Single Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
17.2 Aggregate Multiple Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

18. Pivoting and Unpivoting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44


18.1 Pivot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
18.2 Unpivot/Melt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

19. Combining Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45


19.1 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
19.2 Append . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

20. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1. Introduction
This PowerQuery Guide to Pandas is designed to make readers familiar with the syntax and core
concepts of Python language and the Pandas library by relating readers to familiar concepts in
PowerQuery for Excel.
This book is inspired by the book PowerShell Guide to Python¹. It is also based on the science of
Associative Learning. Use what you know to learn what you don’t.
What else is more popular than the Excel in the corporate world? With this premise, I’ll provide a
comparative approach to learning Pandas and hopefully serve as good introduction to the Python
language.
PowerQuery is a great tool and should be part of every accountant’s arsenal. However, Python is
more mature and has wider reach in areas such as Data Science and Artificial Intelligence. I believe
that Data Science/Data Analysis is the future for accountants like me. Learning Python will be useful
to anyone who wants to bring more value to their work.
I hope that this book provides you a jump-start in learning Python.
¹https://leanpub.com/PowerShell-to-Python
2. About the Book
Excel is my go-to tool in doing my work. It is often described as the Swiss Army knife of data
analysis² and every business user loves it. Excel also has evolved and new functionalities like
PowerQuery were added making it more versatile than ever.
However, PowerQuery is not available in other operating systems. With this reason, it is helpful to
learn Python and Pandas. Realizing that there’s similarity between PowerQuery and Pandas, I could
relate concepts I learned in PowerQuery and apply it to Pandas. This is a book on my journey to
learning data analysis using PowerQuery, Python, and Pandas.

2.1 Why did I write this book


When I first encounter PowerQuery, I’m really amazed on how quick I can do data cleansing and
transformations quickly without having to write a lot of code. I could just use Excel as the GUI to
build my solutions and within minutes, I could come up with the desired result for my data.
However, PowerQuery is only available in Excel 2013 and up, and in PowerBI on the Windows
platform. If I need to do any data transformations on another OS, let’s say on a Mac, then I have to
use other software for this. The best I could think of is the Pandas library in Python.
Pandas is purely code-based solution. The downside of this is that, it takes a bit of a learning curve
to use it well. In addition, you can’t see your data right away as you apply your transformations.
The usual practice is to print the dataframe at each step of the transformation to see if it match your
expected result.
Programmers and analysts tend to practice a lot with open source data in order to learn Pandas. They
practice a lot until it becomes part of the “muscle memory”. However, this approach is really not
effective for me as we don’t “own” the data. Especially for professionals coming from the corporate
world, like accountants, practicing with data containing sepal width of flowers³ just doesn’t feel
right.
To learn Pandas quickly, I realized that I could map the common operations I do in PowerQuery and
come up with a library of Pandas snippets that I could used. In addition, thinking in terms of the
PowerQuery interface, somehow points me to the right code when using Pandas. For example, the
following is Groupby window in PowerQuery
²https://exceleratorbi.com.au/excel-is-the-swiss-army-knife-of-business-intelligence/
³http://archive.ics.uci.edu/ml/datasets/Iris
2. About the Book 3

See how the upper portion of the Groupby windows corresponds to the column that you want to
base the grouping on. The lower portion corresponds on how you summarize the other columns
based on the grouping.
The resulting aggregation is as follows
2. About the Book 4

On the other hand, the equivalent Pandas code is as follows

1 df.groupby(['Date']).agg({'Sales_Dollars' : np.sum})
2 .rename(columns={'Sales_Dollars':'Total Sales'}).reset_index()

df.groupby(['Date']) can be mapped to the upper portion of the Groupby window. The rest of the
code can mapped to the lower portion of the Groupby window.
The key takeaway here is that:

• The Date column is pass first to be the basis of grouping. This can be mapped to the upper
portion of the Groupby window.
• The np.sum is applied to the Sales_Dollars column to do the aggregation. np stands for the
Numpy alias when it is imported.
2. About the Book 5

• Then the Sales_Dollars column is renamed to Total Sales. This is optional. If omitted, the
aggregated column will still be named as Sales_Dollars
• Finally, the index is also reset using the reset_index() method to return to using integer
indices. This is also optional. By default, groupby() will change the index to the column used
as basis for aggregation, in this case the Date.

The resulting aggregration is as follows

By using PowerQuery concepts and associating them with Pandas, I hope that you will learn Pandas
quicker. I hope that this book will also lower the barrier to learn Pandas as well as other Python
libraries. Corporate professionals wishing to learn more about Python and wants to dip their toe on
other areas of programming will find this book useful.
2. About the Book 6

2.2 Who is the Target Audience


Next to Excel, Python is the best software to use in doing data analysis work. It is also one of the
top programming language used in Data Science, Machine Learning, and Artificial Intelligence. I
believe that accountants like me are well in position to take advantage of this language to automate
their work and provide more value in their work.
Specifically, I believe that the following would find this book helpful

• Advance Excel users ready to take advantage of the Python programming language and add it
to their toolbox in addition to VBA.
• Corporate professionals wishing to automate their work and make their solutions work across
different platforms.
• Python developers who want to learn PowerQuery and add it to their toolbox as well.
• Finance professionals wishing to dip their toe on programming by applying Python to projects
relevant to their work.

2.3 Prerequisites
A good grasp of PowerQuery and it’s programming capabilities will surely help in learning the
materials presented in this book. To use PowerQuery, Excel 2010 or higher or Power BI is required.
PowerQuery only works on the Windows platform hence the Windows operating system is required.
Python should also be installed. I would recommend the Anaconda⁴ distribution as it already
contains the Pandas and other required libraries so you could jump into code right away. For
beginners, I would recommend using Jupyter Notebook⁵.
To run Jupyter, just issue the following command in the command line

1 $ jupyter notebook

It would open the following in your browser

⁴https://www.anaconda.com/distribution/
⁵https://jupyter.org/
2. About the Book 7

Then, under New, go to Notebook: Python.

Now, you could change the name of the notebook to your liking and issue Python code in the cells.
Example would be as follows

The Python code examples used in this book were tested using Hydrogen⁶ plugin-in in Atom⁷ text
editor. Hydrogen bring the power of Jupyter notebook inside the text editor making me test code
samples faster.
Example would be like this

The #%% marks the start of each code cell and can be run independently from other cells. Hydrogen
⁶https://atom.io/packages/hydrogen
⁷https://atom.io/
2. About the Book 8

allows me to transition to command line scripts quickly after testing them on Atom.

2.4 Conventions in the Book


To follow along with the book, please be aware of the following conventions used in this book.

• M or PowerQuery are used interchangeably to refer to the M code used in this book.
• I used Python/Pandas if the description applies to both Python and Pandas. I’ll used Python if
the text applies to Python and Pandas if it applies to Pandas only.
• code refers to actual code snippets in M or Python/Pandas.
• bold for section headings but not main headings. It also applies to first mention of software
names used in the book.
• italics applies to first mentions of GUI sections of PowerQuery, filenames, etc.

2.5 Feedback
In case you have any questions, comments, or your helpful feedback about this book, please send it
to my Linkedin⁸ or Twitter⁹ accounts. I’ll answer them as much as possible.
⁸https://www.linkedin.com/in/kennethinfante/
⁹https://twitter.com/iamkennethcpa
3. The Basics
3.1 Version

PowerQuery

PowerQuery is fairly new product from the Microsoft as it became available for download for Excel
2010 and only became part of it started Excel 2016. As such, updates are constantly being made
to the platform as well as its underlying language. It is important that you’re at least aware of
the PowerQuery version you’re using as some functions have changed their API between versions.
The easiest way to make your code work in the latest version is to record it again after updating
PowerQuery.
For 2010 and 2013 versions, you have to activate/install PowerQuery before you can use it.
Instructions can be found here¹⁰
To determine PowerQuery version

• In Excel 2016, go to Data > Get Data (or New Query, if you are running Excel build prior to
8067.2115) > Query Options > Diagnostics and observe the PowerQuery version there.
• In Excel 2010 or 2013, got to PowerQuery tab > Options > Diagnostics and observe the
PowerQuery version there.

Python/Pandas

There are two major versions of Python, the Python 2 and Python 3. Python 3 is very similar to
Python 2 in terms of syntax and as such Python 2 code may still work when run using Python 3
interpreter. However, it is important to use Python 3 as Python 2 support has already been dropped.
Python 3 also has some speed and security improvements over Python 2.
To determine your Python version, you could use the terminal

or a script
¹⁰https://www.excelcampus.com/install-power-query/
3. The Basics 10

1 import sys
2 print(sys.version_info)

On the other hand, to determine the Pandas version, you could use the following code (in command
line or a script)

1 import pandas as pd
2 pd.__version__

3.2 Case Sensitivity

PowerQuery

M, the language behind PowerQuery, is case-sensitive. If you want to disable case sensitivity in
comparison operators, use Comparer.OrdinalIgnoreCase as the comparer argument.

As you would see, PowerQuery recognize that "True" is not equal to "true". However, PowerQuery
returns 0 for false and 1 for true.

Python/Pandas

Python is case-sensitive and hence Pandas. If you’re comparing strings, you may want to use
pandas.Series.str.lower method first

1 >>> 'True' == 'true'


2 False
3
4 >>> True == "True"
5 False

Take note that True is a boolean type and hence not equal to "True" as this is a string type. More
on types later on in this book.
3. The Basics 11

3.3 Indentation

PowerQuery

PowerQuery is not strict when it comes to indentation but is strict as to trailing commas separating
each line of code. For example, the following M code shows an error because of missing comma
before the #'Sorted Rows' step.

Python/Pandas

Pandas is strict with indentation being written on Python.


3. The Basics 12

3.4 Comments
Comments are parts of the code that are not executed as code. They provide documentation on the
input and output of the code. They are meant to explain what the code does.

PowerQuery

In PowerQuery, comments are written as follows

1 // This is a single line comment


2
3 /*
4 This is a
5 multiline comment
6 */

Python/Pandas

In Python/Pandas, comments are written as follows

1 # This is a single line comment


2
3 '''
4 This is a
5 multiline comment
6 '''
7
8 """
9 This is one
10 too is multiline comment
11 """

3.5 Variables
Variables are used to store values that you would use later in the code. The values are assigned using
the = (hence, the assignment operator). You could also change the value of a variable (hence, the
name “variable”) by reassigning it to another value or another variable.

PowerQuery

In PowerQuery, a variable is used for each intermediate step as well as for functions you might want
to define.
Example would be the Commission and Result in the following code:
3. The Basics 13

1 let
2 SalesTotal = 100 + 15 + 275 + 25,
3 CommissionRate = 0.2,
4 CalculateCommission = (sales, rate) => sales * rate,
5 Commission = CalculateCommission(SalesTotal, CommissionRate),
6 Result = Commission
7 in
8 Result

PowerQuery has two ways of writing variable names. Variable names like VariableName are called
regular identifiers while the hash-quotes style names used by GUI tools like #'VariableName' are
called quoted identifiers.
Variable name rules for PowerQuery’s regular identifiers

• Must begin with a letter (a - z, A - B) or underscore (_)


• Must not contain space
• Case Sensitive.
• Can be any (reasonable) length.
• Cannot use reserved keywords (like let, in)

Variable name rules for PowerQuery’s quoted identifiers

• Since these are quoted, they can begin with numbers (ei. #"20PercentRate")
• They can contain space too. (ei. #"Sales Total")
• Case Sensitive.
• Can be any (reasonable) length.
• Cannot use reserved keywords (like let, in)

So in both cases, you cannot use reserved keywords in variable names.

Python/Pandas

In Python/Pandas, variable names

• Must begin with a letter (a - z, A - B) or underscore (_)


• Must not contain space
• Case Sensitive.
• Can be any (reasonable) length.
• Cannot use reserved keywords (like for, while)

You cannot use reserved keywords in variable names.


3. The Basics 14

3.6 The Help System

PowerQuery

PowerQuery has hidden gem in the form the #shared query to get the list of available functions.
Just create a blank query and paste #shared as follows

You can access also the online documentation for PowerQuery in this link¹¹.
I find the function reference on this link¹² useful too.
If you’re using Excel for Office 365, your PowerQuery version comes with intellisense feature which
guides you how to use certain functions in PowerQuery.
¹¹https://support.office.com/en-us/article/Microsoft-Power-Query-for-Excel-Help-2b433a85-ddfb-420b-9cda-fe0e60b82a94?
CorrelationId=5cc8d039-a622-404f-a9fe-1a04d52f249b&ui=en-US&rs=en-US&ad=US
¹²https://docs.microsoft.com/en-us/power-query-m/power-query-m-function-reference
3. The Basics 15

Python/Pandas

Python comes with the built-in help() function to access the built-in documentation inside the
Pandas library.

You could access the Python documentation in the this link¹³ and the Pandas documentation in this
link¹⁴.
¹³docs.python.org
¹⁴pandas.pydata.org/pandas-docs/stable
3. The Basics 16

3.7 Coding Tools

PowerQuery

The PowerQuery editor is wonderful too when writing PowerQuery code. The system works like a
giant macro recorder. For each step, the editor records the corresponding M code and can be viewed
in the Advanced editor. For example

Notice how the code lines correspond to the steps on the right-hand side. The step name becomes
the identifier name in the M code.

Python/Pandas

There are a lot of tools in writing Python code. The most popular one when writing Pandas is the
Jupyter Notebook. Personally, I’m using Atom’s Hydrogen plugin to run Python scripts like Jupyter
notebooks.
3. The Basics 17

Writing Pandas code may take a while to master as changes on the underlying Dataframe do not
persist by default. Common workflow is to peek using head() at each change.
To persist any change in the dataframe, you have to assign it back to a variable or use inplace=True.
4. Input and Output
4.1 Input

PowerQuery

PowerQuery provides a GUI to get data from sources. In the Excel window, they are in the Data >
Get & Transform group

Another way is to right-click in the Queries pane > New Query in the PowerQuery editor.
4. Input and Output 19

Python/Pandas

Reading data in Python is with the open() function passing in the r argument which stands for read.
This will return a file handler then we can read contents of file using the read() or readlines()
methods

1 with open('my_file.csv', 'r') as f:


2 print(f.read())

In Pandas, we have different read methods for different file types. These methods are optimized for
reading large files and also creates a dataframe out of the contents.
4. Input and Output 20

1 import pandas as pd
2
3 df = pd.read_csv('my_file.csv')
4 print(df.head())
5
6 # reads the first sheet by default
7 df = pd.read_excel('my_file.xlsx')
8 print(df.head())
9
10 df = pd.read_json('my_file.json')
11 print(df.head())

There are other “read” methods in the documentation¹⁵ if you want to learn more.

4.2 Output

PowerQuery

In PowerQuery, outputting is basically returning a value from a query. For example, the following
query

results to the following


¹⁵https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
4. Input and Output 21

Most of the time, you will return a table from a query. We’re going to revisit this later on.

Python/Pandas

We had glimpse of the outputting data in Python in the above section.


Basically, Python has the print() function to output a variable (string or numeric types). For
example

1 >>> my_var = "Learning Python is fun!"


2 >>> print(my_var)
3 Learning Python is fun!

Pandas provide methods that you could use to output data. These are the head() and tail() methods.
The head() returns the first 5 rows and tails() returns the last 5 rows of data. If you’re using
a Jupyter notebook, there’s no need to use print() as Jupyter will automatically output the data
returned by head().
4. Input and Output 22

If you want to persist the data in a file, perhaps after doing some transformations in the data, you
could use to_csv() or to_excel() methods of the dataframe.

1 df.to_csv('my_transformed_data.csv')
2 df.to_excel('my_transformed_data.xlsx')

There are other “to” methods in the documentation¹⁶ if you want to learn more.
¹⁶https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
5. Data Types
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

5.1 Built-in Data Types


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

5.2 Field Data Types


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
6. Control Structures
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

6.1 Conditional Statements


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

6.2 Loops
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
7. Error Handling
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

7.1 If Statement
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

7.2 Try Statement


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
8. Functions
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

8.1 Function Basics


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

8.2 Optional Parameters & Default Values


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
8. Functions 27

8.3 Return Multiple Values


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

8.4 Type Hinting


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
9. Classes
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

9.1 Basics
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
10. Strings
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

10.1 Basics
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

10.2 Transforming a String Field


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
10. Strings 30

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

10.3 String Slicing


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
11. Date and Time
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

11.1 Basics
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Datetime and Date Objects

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Time Objects

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Timedelta Objects

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
11. Date and Time 32

11.2 Transforming a Datetime Field


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

11.3 Date Formatting


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
12. Structured Data Records
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

12.2 List
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

12.3 Records and Dictionaries


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Python/Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
12. Structured Data Records 34

12.4 Creating a Table/Dataframe from List, Records or


Dictionaries
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
13. Introduction to Data Wrangling
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

13.2 Rename Columns


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

13.3 Reorder Columns


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
13. Introduction to Data Wrangling 36

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

13.4 Sorting
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

13.5 Unique Values


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

13.6 Dealing with Duplicates


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
13. Introduction to Data Wrangling 37

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

13.7 Getting Info


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
14. Adding Columns
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

14.1 Duplicating a Column


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

14.2 Extracting Values from a Column


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
14. Adding Columns 39

14.3 Conditional Column


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

14.4 Index Column


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

14.5 Thru a Formula


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
15. Subsetting Data
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

15.1 Selecting Columns and Rows


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

15.2 loc and iloc (Pandas only)


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

15.3 Removing Columns and Rows


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
16. Dealing with Missing Data
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

16.1 Fill Forward and Fill Backward


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

16.2 Recode/Replace Values


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
16. Dealing with Missing Data 42

16.3 Dropping Missing Data


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
17. Grouping Data
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

17.1 Aggregate a Single Column


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

17.2 Aggregate Multiple Columns


This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
18. Pivoting and Unpivoting Data
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

18.1 Pivot
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

18.2 Unpivot/Melt
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
19. Combining Data
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

19.1 Merge
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

19.2 Append
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

PowerQuery

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

Pandas

This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.
20. Conclusion
This content is not available in the sample book. The book can be purchased on Leanpub at http:
//leanpub.com/powerqueryguidetopandas.

You might also like