Top 50 Pandas Interview Questions and Answers (2024)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Pandas Interview Questions

6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Last Updated : 16 Jun, 2024


Panda is a FOSS (Free and Open Source Software) Python library which
provides high-performance data manipulation, in Python. It is used in various
areas like data science and machine learning.

Pandas is not just a library, it’s an essential skill for professionals in various
domains,
Python including
Basics Interview finance,
Questions healthcare,
Python Quiz and marketing.
Popular Packages Python ProjectsThis library
Practice Python AI
streamlines data manipulation tasks, offering robust features for data
loading, cleaning, transforming, and much more. As a result, understanding
Pandas is a key requirement in many data-centric job roles.

This Panda interview question for data science covers basic and advanced
topics to help you succeed with confidence in your upcoming interviews. We
do not just cover theoretical questions, we also provide practical coding
questions to test your hands-on skills. This is particularly beneficial for
aspiring Data Scientists and ML professionals who wish to demonstrate
their proficiency in real-world problem-solving.

So, whether you are starting your journey in Python programming or looking
to brush up on your skills, “This Panda Interview Questions” is your
essential resource for acing those technical interviews.

https://www.geeksforgeeks.org/pandas-interview-questions/ 1/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Let’s dive in and unlock the potential of Pandas together!

Pandas Basic Interview Questions & Answers


This article contains Top 50 Picked Pandas Questions with solutions for
Python interviews, This article is a one-stop solution to prepare for your
upcoming interviews and stay updated with the latest trends in the industry.
In this article, we will explore some most commonly asked Pandas interview
questions and answers, which are divided into the following sections:

Pandas Interview Questions for Freshers


Pandas Interview Questions for Experienced
Pandas Interview Questions for Data Scientists

Pandas Interview Questions for Freshers

Q1. What are Pandas?

Pandas is an open-source Python library that is built on top of the NumPy


library. It is made for working with relational or labelled data. It provides
various data structures for manipulating, cleaning and analyzing numerical
data. It can easily handle missing data as well. Pandas are fast and have
high performance and productivity.

Q2. What are the Different Types of Data Structures in Pandas?

The two data structures that are supported by Pandas are Series and
DataFrames.

Pandas Series is a one-dimensional labelled array that can hold data of


any type. It is mostly used to represent a single column or row of data.
Pandas DataFrame is a two-dimensional heterogeneous data structure. It
stores data in a tabular form. Its three main components are data, rows,

https://www.geeksforgeeks.org/pandas-interview-questions/ 2/34
Pandas are used for efficient data
6/24/24, 11:21 PM
analysis. The key features of Pandas are
Top 50 Pandas Interview Questions and Answers (2024)
as follows:

Fast and efficient data manipulation and analysis


Provides time-series functionality
Easy missing data handling
Faster data merging and joining
Flexible reshaping and pivoting of data sets
Powerful group by functionality
Data from different file objects can be loaded
Integrates with NumPy

Q4. What is Series in Pandas?

Ans: A Series in Pandas is a one-dimensional labelled array. Its columns are


like an Excel sheet that can hold any type of data, which can be, an integer,
string, or Python objects, etc. Its axis labels are known as the index. Series
contains homogeneous data and its values can be changed but the size of
the series is immutable. A series can be created from a Python tuple, list and
dictionary. The syntax for creating a series is as follows:

import pandas as pd
series = pd.Series(data)

Q5. What are the Different Ways to Create a Series?

Ans: In Pandas, a series can be created in many ways. They are as follows:

Creating an Empty Series

An empty series can be created by just calling the pandas.Series()


constructor.

Python

# import pandas as pd
import pandas as pd

https://www.geeksforgeeks.org/pandas-interview-questions/ 3/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Output:

Series([], dtype: float64)

Creating a Series from an Array

In order to create a series from the NumPy array, we have to import the
NumPy module and have to use the array() function.

Python

# import pandas and numpy


import pandas as pd
import numpy as np

# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])

# convert array to Series


print(pd.Series(data))

Output:

0 g
1 e
2 e
3 k
4 s
dtype: object

Creating a Series from an Array with a custom Index

In order to create a series by explicitly proving the index instead of the


default, we have to provide a list of elements to the index parameter with
the same number of elements as it is an array.

Python

https://www.geeksforgeeks.org/pandas-interview-questions/ 4/34
# simple array
6/24/24, 11:21 PM
data = np.array(['g', Top 50 Pandas Interview Questions and Answers (2024)
'e', 'e', 'k', 's'])

# providing an index
ser = pd.Series(data, index=[10, 11, 12, 13, 14])
print(ser)

Output:

10 g
11 e
12 e
13 k
14 s
dtype: object

Creating a Series from a List

We can create a series using a Python list and pass it to the Series()
constructor.

Python

# import pandas
import pandas as pd

# a simple list
list = ['g', 'e', 'e', 'k', 's']

# create series form a list


print(pd.Series(list))

Output:

0 g
1 e
2 e
3 k

https://www.geeksforgeeks.org/pandas-interview-questions/ 5/34
A Series can also be created from
6/24/24, 11:21 PM a Python
Top 50 Pandas Interviewdictionary. The keys
Questions and Answers (2024) of the

dictionary as used to construct indexes of the series.

Python

# import pandas
import pandas as pd

# a simple dictionary
dict = {'Geeks': 10,
'for': 20,
'geeks': 30}

# create series from dictionary


print(pd.Series(dict))

Output:

Geeks 10
for 20
geeks 30
dtype: int64

Creating a Series from Scalar Value

To create a series from a Scalar value, we must provide an index. The Series
constructor will take two arguments, one will be the scalar value and the
other will be a list of indexes. The value will repeat until all the index values
are filled.

Python

# import pandas and numpy


import pandas as pd
import numpy as np

# giving a scalar value with index


ser = pd.Series(10, index=[0, 1, 2, 3, 4, 5])

print(ser)

https://www.geeksforgeeks.org/pandas-interview-questions/ 6/34
0 10
6/24/24, 11:21 PM
1 10 Top 50 Pandas Interview Questions and Answers (2024)

2 10
3 10
4 10
5 10
dtype: int64

Creating a Series using NumPy Functions

The Numpy module’s functions, such as numpy.linspace(), and


numpy.random.randn() can also be used to create a Pandas series.

Python

# import pandas and numpy


import pandas as pd
import numpy as np

# series with numpy linspace()


ser1 = pd.Series(np.linspace(3, 33, 3))
print(ser1)

# series with numpy linspace()


ser2 = pd.Series(np.random.randn(3))
print("\n", ser2)

Output:

0 3.0
1 18.0
2 33.0
dtype: float64
0 0.694519
1 0.782243
2 0.082820
dtype: float64

Creating a Series using the Range Function

We can also create a series in Python by using the range function.

https://www.geeksforgeeks.org/pandas-interview-questions/ 7/34
import pandas as pd
6/24/24, 11:21 PM print(pd.Series(range(5)))
Top 50 Pandas Interview Questions and Answers (2024)

Output:

0 0
1 1
2 2
3 3
4 4
dtype: int64

Creating a Series using List Comprehension

Here, we will use the Python list comprehension technique to create a series
in Pandas. We will use the range function to define the values and a for loop
for indexes.

Python

# import pandas
import pandas as pd
ser = pd.Series(range(1, 20, 3),
index=[x for x in 'abcdefg'])
print(ser)

Output:

a 1
b 4
c 7
d 10
e 13
f 16
g 19
dtype: int64

https://www.geeksforgeeks.org/pandas-interview-questions/ 8/34
as follows:
6/24/24, 11:21 PM
Shallow Copy is a copy of theTopseries
50 Pandas Interview Questions and Answers (2024)
object where the indices and the data
of the original object are not copied. It only copies the references to the
indices and data. This means any changes made to a series will be reflected
in the other. A shallow copy of the series can be created by writing the
following syntax:

ser.copy(deep=False)

Deep Copy is a copy of the series object where it has its own indices and
data. This means nay changes made to a copy of the object will not be
reflected tot he original series object. A deep copy of the series can be
created by writing the following syntax:

ser.copy(deep=True)

The default value of the deep parameter of the copy() function is set to True.

Q7. What is a DataFrame in Pandas?

Ans: A DataFrame in Panda is a data structure used to store the data in


tabular form, that is in the form of rows and columns. It is two-dimensional,
size-mutable, and heterogeneous in nature. The main components of a
dataframe are data, rows, and columns. A dataframe can be created by
loading the dataset from existing storage, such as SQL database, CSV file,
Excel file, etc. The syntax for creating a dataframe is as follows:

import pandas as pd
dataframe = pd.DataFrame(data)

Q8. What are the Different ways to Create a DataFrame in Pandas?

Ans: In Pandas, a dataframe can be created in many ways. They are as


follows:

Creating an Empty DataFrame

https://www.geeksforgeeks.org/pandas-interview-questions/ 9/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

# import pandas as pd
import pandas as pd

# Calling DataFrame constructor


print(pd.DataFrame())

Output:

Empty DataFrame
Columns: []
Index: []

Creating a DataFrame using a List

In order to create a DataFrame from a Python list, just pass the list to the
DataFrame() constructor.

Python

# import pandas as pd
import pandas as pd

# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']

# Calling DataFrame constructor on list


print(pd.DataFrame(lst))

Output:

0
0 Geeks
1 For
2 Geeks
3 is
4 portal

https://www.geeksforgeeks.org/pandas-interview-questions/ 10/34
A DataFrame can be created from
6/24/24, 11:21 PM a Python
Top 50 Pandas Interviewlist of lists
Questions and passed
and Answers (2024) the main
list to the DataFrame() constructor along with the column names.

Python

# import pandas as pd
import pandas as pd

# list of strings
lst = [[1, 'Geeks'], [2, 'For'], [3, 'Geeks']]

# Calling DataFrame constructor


# on list with column names
print(pd.DataFrame(lst, columns=['Id', 'Data']))

Output:

Id Data
0 1 Geeks
1 2 For
2 3 Geeks

Creating a DataFrame using a Dictionary

A DataFrame can be created from a Python dictionary and passed to the


DataFrame() constructor. The Keys of the dictionary will be the column
names and the values of the dictionary are the data of the DataFrame.

Python

import pandas as pd

# initialise data of lists.


data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19,
18]}

# Print the dataframe created


print(pd.DataFrame(data))

https://www.geeksforgeeks.org/pandas-interview-questions/ 11/34
1 nick 21
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
2 krish 19
3 jack 18

Creating a DataFrame using a List of Dictionaries

Another way to create a DataFrame is by using Python list of dictionaries.


The list is passed to the DataFrame() constructor. The Keys of each
dictionary element will be the column names.

Python

# import pandas as pd
import pandas as pd

# list of strings
lst = [{1: 'Geeks', 2: 'For', 3: 'Geeks'},
{1: 'Portal', 2: 'for', 3: 'Geeks'}]

# Calling DataFrame constructor on list


print(pd.DataFrame(lst))

Output:

1 2 3
0 Geeks For Geeks
1 Portal for Geeks

Creating a DataFrame from Pandas Series

A DataFrame in Pandas can also be created by using the Pandas series.

Python

# import pandas as pd
import pandas as pd

# list of strings
lst = pd.Series(['Geeks', 'For', 'Geeks'])

# Calling DataFrame constructor on list


print(pd.DataFrame(lst))

https://www.geeksforgeeks.org/pandas-interview-questions/ 12/34
0
6/24/24, 11:21 PM
0 Geeks Top 50 Pandas Interview Questions and Answers (2024)

1 For
2 Geeks

Q9. How to Read Data into a DataFrame from a CSV file?

Ans: We can create a data frame from a CSV file – “Comma Separated
Values”. This can be done by using the read_csv() method which takes the
csv file as the parameter.

pandas.read_csv(file_name)

Another way to do this is by using the read_table() method which takes the
CSV file and a delimiter value as the parameter.

pandas.read_table(file_name, deliniter)

Q10. How to access the first few rows of a dataframe?

Ans: The first few records of a dataframe can be accessed by using the
pandas head() method. It takes one optional argument n, which is the
number of rows. By default, it returns the first 5 rows of the dataframe. The
head() method has the following syntax:

df.head(n)

Another way to do it is by using iloc() method. It is similar to the Python list-


slicing technique. It has the following syntax:

df.iloc[:n]

Q11. What is Reindexing in Pandas?

Ans: Reindexing in Pandas as the name suggests means changing the index
of the rows and columns of a dataframe. It can be done by using the Pandas

https://www.geeksforgeeks.org/pandas-interview-questions/ 13/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Q12. How to Select a Single Column of a DataFrame?

Ans: There are many ways to Select a single column of a dataframe. They
are as follows:

By using the Dot operator, we can access any column of a dataframe.

Dataframe.column_name

Another way to select a column is by using the square brackets [].

DataFrame[column_name]

Q13. How to Rename a Column in a DataFrame?

Ans: A column of the dataframe can be renamed by using the rename()


function. We can rename a single as well as multiple columns at the same
time using this method.

DataFrame.rename(columns={'column1': 'COLUMN_1',
'column2':'COLUMN_2'}, inplace=True)

Another way is by using the set_axis() function which takes the new column
name and axis to be replaced with the new name.

DataFrame.set_axis(labels=['COLUMN_1','COLUMN_2'], axis=1,
inplace=True)

In case we want to add a prefix or suffix to the column names, we can use
the add_prefix() or add_suffix() methods.

DataFrame.add_prefix(prefix='PREFIX_')
DataFrame.add_suffix(suffix='_suffix')

Q14. How to add an Index, Row, or Column to an Existing Dataframe?

https://www.geeksforgeeks.org/pandas-interview-questions/ 14/34
index of a dataframe. The set_index() method has the following syntax:
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

df.set_index(keys, drop=True, append=False, inplace=False,


verify_integrity=False)

Adding Rows

The df.loc[] is used to access a group of rows or columns and can be used to
add a row to a dataframe.

DataFrame.loc[Row_Index]=new_row

We can also add multiple rows in a dataframe by using pandas.concat()


function which takes a list of dataframes to be added together.

pandas.concat([Dataframe1,Dataframe2])

Adding Columns

We can add a column to an existing dataframe by just declaring the column


name and the list or dictionary of values.

DataFrame[data] = list_of_values

Another way to add a column is by using df.insert() method which take a


value where the column should be added, column name and the value of the
column as parameters.

DataFrameName.insert(col_index, col_name, value)

We can also add a column to a dataframe by using df.assign() function

DataFrame.assign(**kwargs)

Q15. How to Delete an Index, Row, or Column from an Existing


DataFrame?

Ans: We can delete a row or a column from a dataframe by using df.drop()


method. and provide the row or column name as the parameter.

https://www.geeksforgeeks.org/pandas-interview-questions/ 15/34
To delete a row
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

DataFrame.drop([Row_Index_Number], axis=0)

Q16. How to set the Index in a Panda dataFrame?

Ans: We can set the index to a Pandas dataframe by using the set_index()
method, which is used to set a list, series, or dataframe as the index of a
dataframe.

DataFrame.set_index('Column_Name')

Q17. How to Reset the Index of a DataFrame?

Ans: The index of Pandas dataframes can be reset by using the reset_index()
method. It can be used to simply reset the index to the default integer index
beginning at 0.

DataFrame.reset_index(inplace = True)

Q18. How to Find the Correlation Using Pandas?

Ans: Pandas dataframe.corr() method is used to find the correlation of all the
columns of a dataframe. It automatically ignores any missing or non-
numerical values.

DataFrame.corr()

Q19. How to Iterate over Dataframe in Pandas?

Ans: There are various ways to iterate the rows and columns of a dataframe.

Iteration over Rows

In order to iterate over rows, we apply a iterrows() function this function


returns each index value along with a series containing the data in each row.
Another way to iterate over rows is by using iteritems() method, which

https://www.geeksforgeeks.org/pandas-interview-questions/ 16/34
remaining values are the row values.
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Iteration over Columns

To iterate columns of a dataframe, we just need to create a list of dataframe


columns by using the list constructor and passing the dataframe to it.

Q20. What are the Important Conditions to keep in mind before


Iterating?

Ans: Iterating is not the best option when it comes to Pandas Dataframe.
Pandas provides a lot of functions using which we can perform certain
operations instead of iterating through the dataframe. While iterating a
dataframe, we need to keep in mind the following things:

While printing the data frame, instead of iterating, we can use


DataFrame.to_string() methods which will display the data in tabular
form.
If we are concerned about time performance, iteration is not a good
option. Instead, we should choose vectorization as pandas have a number
of highly optimized and efficient built-in methods.
We should use the apply() method instead of iteration when there is an
operation to be applied to a few rows and not the whole database.

Pandas Interview Questions for Experienced

Q21. What is Categorical Data and How it is represented in Pandas?

Ans: Categorical data is a set of predefined data values under some


categories. It usually has a limited and fixed range of possible values and
can be either numerical or textual in nature. A few examples of categorical
data are gender, educational qualifications, blood type, country affiliation,
observation time, etc. In Pandas categorical data is often represented by
Object datatype.

Q22. How can a DataFrame be Converted to an Excel File?

https://www.geeksforgeeks.org/pandas-interview-questions/ 17/34
DataFrame.to_excel(file_name)
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Q23. What is Multi-Indexing in Pandas?

Ans: Multi-indexing refers to selecting two or more rows or columns in the


index. It is a multi-level or hierarchical object for pandas object and deals
with data analysis and works with higher dimensional data. Multi-indexing
in Pandas can be achieved by using a number of functions, such as
MultiIndex.from_arrays, MultiIndex.from_tuples, MultiIndex.from_product,
MultiIndex.from_frame, etc which helps us to create multiple indexes from
arrays, tuples, dataframes, etc.

Q24. How to select Specific Data-types to Include or Exclude in the


DataFrame?

Ans: The Pandas select_dtypes() method is used to include or exclude a


specific type of data in the dataframe. The datatypes to include or exclude
are specified to it as a list or parameters to the function. It has the following
syntax:

DataFrame.select_dtypes(include=['object','float'], exclude =['int'])

Q25. How to Convert a DataFrame into a Numpy Array?

Ans: Pandas Numpy is an inbuilt Python package that is used to perform


large numerical computations. It is used for processing multidimensional
array elements to perform complicated mathematical operations.

The pandas dataframe can be converted to a NumPy array by using the


to_numpy() method. We can also provide the datatype as an optional
argument.

Dataframe.to_numpy()

We can also use .values to convert dataframe values to NumPy array

https://www.geeksforgeeks.org/pandas-interview-questions/ 18/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Ans: Boolean masking is a technique that can be used in Pandas to split a
DataFrame depending on a boolean criterion. You may divide different
regions of the DataFrame and filter rows depending on a certain criterion
using boolean masking.

# Define the condition


condition = DataFrame['col_name'] < VALUE
# DataFrame with rows where the condition is True
DataFrame1 = DataFrame[condition]
# DataFrame with rows where the condition is False
DataFrame1 = DataFrame[~condition]

Q27. What is Time Series in Pandas?

Ans: Time series is a collection of data points with timestamps. It depicts the
evolution of quantity over time. Pandas provide various functions to handle
time series data efficiently. It is used to work with data timestamps,
resampling time series for different time periods, working with missing data,
slicing the data using timestamps, etc.

Pandas Built-in Function Operation

Convert ‘Date’ column of


pandas.to_datetime(DataFrame['Date']) DataFrame to datetime
dtype

DataFrame.set_index('Date',
inplace=True)
Set ‘Date’ as the index

Resample time series to a


different frequency (e.g.,
DataFrame.resample('H').sum()
Hourly, daily, weekly,
monthly etc)

https://www.geeksforgeeks.org/pandas-interview-questions/ 19/34
DataFrame.interpolate()
linear interpolation
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

DataFrame.loc[start_date:end_date] Slice the data based on


timestamps

Q28. What is Time Delta in Pandas?

Ans: The time delta is the difference in dates and time. Similar to the
timedelta() object in the datetime module, a Timedelta in Pandas indicates
the duration or difference in time. For addressing time durations or time
variations in a DataFrame or Series, Pandas has a dedicated data type.

The time delta object can be created by using the timedelta() method and
providing the number of weeks, days, seconds, milliseconds, etc as the
parameter.

Duration = pandas.Timedelta(days=7, hours=4, minutes=30, seconds=23)

With the help of the Timedelta data type, you can easily perform arithmetic
operations, comparisons, and other time-related manipulations. In terms of
different units, such as days, hours, minutes, seconds, milliseconds, and
microseconds, it can give durations.

Duration + pandas.Timedelta('2 days 6 hours')

Q29. What is Data Aggregation in Pandas?

Ans: In Pandas, data aggregation refers to the act of summarizing or


decreasing data in order to produce a consolidated view or summary
statistics of one or more columns in a dataset. In order to calculate statistical
measures like sum, mean, minimum, maximum, count, etc., aggregation
functions must be applied to groups or subsets of data.

The agg() function in Pandas is frequently used to aggregate data. Applying


one or more aggregation functions to one or more columns in a DataFrame
or Series is possible using this approach. Pandas’ built-in functions or

https://www.geeksforgeeks.org/pandas-interview-questions/ 20/34
'count'})
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Q30. Difference between merge() and concat()

Ans: The following table shows the difference between merge() and
concat():

.merge() concat()

It is used to join exactly 2


It is used to join 2 or more dataframes along
dataframes based on a
a particular axis i.e rows or columns
common column or index

Perform different types of joins Performs concatenation by appending the


such as inner join, outer join, dataframes one below the other (along the
left join, and right join. rows) or side by side (along the columns).

By default, performs row-wise concatenation


Join types and column names (i.e. axis=0).
have to be specified. To perform column-wise concatenation (i.e.
axis=1)

Multiple columns can be Does not perform any sort of matching or


merged if needed joining based on column values

Used when we want to Commonly used when you want to combine


combine data based on a dataframes vertically or horizontally without
shared column or index. any matching criteria.

Q31. Difference between map(), applymap(), and apply()

Ans: The map(), applymap(), and apply() methods are used in pandas for
applying functions or transformations to elements in a DataFrame or Series.
The following table shows the difference between map(), applymap() and
apply():

https://www.geeksforgeeks.org/pandas-interview-questions/ 21/34
Defined only in Dataframe
Series and DataFrame
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Used to apply a
Used to apply a
function or a Used to apply a function to
function along a
dictionary to each each element of the
specific axis of the
element of the DataFrame.
DataFrame or Series.
Series.

Series.map() works
DataFrame.applymap() DataFrame.apply()
element-wise and
works element-wise, works on either entire
can be used to
applying the provided rows or columns
perform element-
function to each element in element-wise of a
wise transformations
the DataFrame. Dataframe or Series
or mappings.

Used when we want


Used when we want to
to apply a simple Used when we want to
apply a function that
transformation or apply a function to each
aggregates or
mapping operation to individual element of a
transforms data across
each element of a Dataframe
rows or columns.
series

Q32. Difference between pivot_table() and groupby()

Ans: Both pivot_table() and groupby() are powerful methods in pandas used
for aggregating and summarizing data. The following table shows the
difference between pivot_table() and groupby():

pivot_table() groupby()

It summarizes and aggregates It performs aggregation on grouped data of


data in a tabular format one or more columns

Used to transform data by Used to group data based on categorical


reshaping it based on column variables then we can apply various

https://www.geeksforgeeks.org/pandas-interview-questions/ 22/34
6/24/24, 11:21 PM Top 50ItPandas
performs
Interviewgrouping based
Questions and Answerson column
(2024) values
It can handle multiple levels of
and creates a GroupBy object then
grouping and aggregation,
aggregation functions, such as sum, mean,
providing flexibility in
count, etc., can be applied to the grouped
summarizing data.
data.

It is used when we want to


compare the data across It is used to summarize data within groups
multiple dimensions

Q33. How can we use Pivot and Melt Data in Pandas?

Ans: We can pivot the dataframe in Pandas by using the pivot_table()


method. To unpivot the dataframe to its original form we can melt the
dataframe by using the melt() method.

Q34. How to convert a String to Datetime in Pandas?

Ans: A Python string can be converted to a DateTime object by using the


to_datetime() function or strptime() method of datetime. It returns a
DateTime object corresponding to date_string, parsed according to the
format string given by the user.

Using Pandas.to_datetime()

Python

import pandas as pd

# Convert a string to a datetime object


date_string = '2023-07-17'
dateTime = pd.to_datetime(date_string)
print(dateTime)

Output:

https://www.geeksforgeeks.org/pandas-interview-questions/ 23/34
Python
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

from datetime import datetime

# Convert a string to a datetime object


date_string = '2023-07-17'
dateTime = datetime.strptime(date_string, '%Y-%m-%d')
print(dateTime)

Output:

2023-07-17 00:00:00

Q35. What is the Significance of Pandas Described Command?

Ans: Pandas describe() is used to view some basic statistical details of a


data frame or a series of numeric values. It can give a different output when
it is applied to a series of strings. It can get details like percentile, mean,
standard deviation, etc.

DataFrame.describe()

Q36. How to Compute Mean, Median, Mode, Variance, Standard


Deviation, and Various Quantile Ranges in Pandas?

Ans: The mean, median, mode, Variance, Standard Deviation, and Quantile
range can be computed using the following commands in Python.

DataFrame.mean(): To calculate the mean


DataFrame.median(): To calculate median
DataFrame.mode(): To calculate the mode
DataFrame.var(): To calculate variance
DataFrame.std(): To calculate the standard deviation
DataFrame.quantile(): To calculate quantile range, with range value as a
parameter

https://www.geeksforgeeks.org/pandas-interview-questions/ 24/34
so that a machine learning model can fit it. To apply label encoding using
pandas we can use the pandas.Categorical().codes
6/24/24, 11:21 PM Top 50 Pandas Interview Questions or
and pandas.factorize()
Answers (2024)

method to replace the categorical values with numerical values.

Q38. How to make Onehot Encoding using Pandas?

Ans: One-hot encoding is a technique for representing categorical data as


numerical values in a machine-learning model. It works by creating a
separate binary variable for each category in the data. The value of the
binary variable is 1 if the observation belongs to that category and 0
otherwise. It can improve the performance of the model. To apply one hot
encoding, we greater a dummy column for our dataframe by using
get_dummies() method.

Q39. How to make a Boxplot using Pandas?

Ans: A Boxplot is a visual representation of grouped data. It is used for


detecting outliers in the data set. We can create a boxplot using the Pandas
dataframe by using the boxplot() method and providing the parameter
based on which we want the boxplot to be created.

DataFrame.boxplot(column='Col_Name', grid=False)

Q40. How to make a Distribution Plot using Pandas?

Ans: A distribution plot is a graphical representation of the distribution of


data. It is a type of histogram that shows the frequency of each value in a
dataset. To create a distribution plot using Pandas, you can use the
plot.hist() method. This method takes a DataFrame as input and creates a
histogram for each column in the DataFrame.

DataFrame['Numerical_Col_Name'].plot.hist()

Pandas Interview Questions for Data Scientists

Q41. How to Sort a Dataframe?

https://www.geeksforgeeks.org/pandas-interview-questions/ 25/34
the dataframe. we can also sort it by multiple columns. To sort it in
descending order, we pass anTop
6/24/24, 11:21 PM additional parameter
50 Pandas Interview ‘ascending’
Questions and Answers (2024) and set it to

False.

DataFrame.sort_values(by='Age',ascending=True)

Q42. How to Check and Remove Duplicate Values in Pandas.

Ans: In pandas, duplicate values can be checked by using the duplicated()


method.

DataFrame.duplicated()

To remove the duplicated values we can use the drop_duplicates() method.

DataFrame.drop_duplicates()

Q43. How to Create a New Column Based on Existing Columns?

Ans: We can create a column from an existing column in a DataFrame by


using the df.apply() and df.map() functions

Q44. How to Handle Missing Data in Pandas?

Ans: Generally dataset has some missing values, and it can happen for a
variety of reasons, such as data collection issues, data entry errors, or data
not being available for certain observations. This can cause a big problem.
To handle these missing values Pandas provides various functions. These
functions are used for detecting, removing, and replacing null values in
Pandas DataFrame:

isnull(): It returns True for NaN values or null values and False for
present values
notnull(): It returns False for NaN values and True for present values
dropna(): It analyzes and drops Rows/Columns with Null values
fillna(): It let the user replace NaN values with some value of their own

https://www.geeksforgeeks.org/pandas-interview-questions/ 26/34
Q45. What is groupby() Function
6/24/24, 11:21 PM in Pandas?
Top 50 Pandas Interview Questions and Answers (2024)

Ans: The groupby() function is used to group or aggregate the data


according to a category. It makes the task of splitting the Dataframe over
some criteria really easy and efficient. It has the following syntax:

DataFrame.groupby(by=['Col_name'])

Q46. What are loc and iloc methods in Pandas?

Ans: Pandas Subset Selection is also known as Pandas Indexing. It means


selecting a particular row or column from a dataframe. We can also select a
number of rows or columns as well. Pandas support the following types of
indexing:

Dataframe.[ ]: This function is also known as the indexing operator


Dataframe.loc[ ]: This function is used for label-based indexing.
Dataframe.iloc[ ]: This function is used for positions or integer-based
indexing.

Q47. How to Merge Two DataFrames?

Ans: In pandas, we can combine two dataframes using the pandas.merge()


method which takes 2 dataframes as the parameters.

Python

import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6]},
index=[10, 20, 30])

df2 = pd.DataFrame({'C': [7, 8, 9],


'D': [10, 11, 12]},
index=[20, 30, 40])

# Merge both dataframe


result = pd.merge(df1, df2, left_index=True, right_index=True)
print(result)

https://www.geeksforgeeks.org/pandas-interview-questions/ 27/34
A B C D
6/24/24, 11:21 PM
20 2 5 7 10 Top 50 Pandas Interview Questions and Answers (2024)

30 3 6 8 11

Q48. Difference between iloc() and loc()

Ans: The iloc() and loc() functions of pandas are used for accessing data
from a DataFrame.The following table shows the difference between iloc()
and loc():

iloc() loc()

It is an indexed-based selection
It is labelled based selection method
method

It allows you to access rows and It allows you to access rows and
columns of a DataFrame by their columns of a DataFrame using their
integer positions labels or names.

The indexing can be based on row


The indexing starts from 0 for both
labels, column labels, or a
rows and columns.
combination of both.

Used for label-based slicing, the


Used for integer-based slicing, which
labels can be single labels, lists or
can be single integers, lists or arrays of
arrays of labels for specific rows or
integers for specific rows or columns.
columns

Syntax: Syntax:

DataFrame.iloc[row_index, DataFrame.loc[row_label,
column_index] column_label]

Q49. Difference between join() and merge()

https://www.geeksforgeeks.org/pandas-interview-questions/ 28/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
join() merge()

Combines dataframes on their Combines dataframes by specifying the


indexes columns as a merge key

Joining is performed on the


Joining is performed based on the values in
DataFrame’s index and not on
the specified columns or indexes.
any specified columns.

Does not support merging based Supports merging based on one or more
on column values or multiple columns or indexes, allowing for more
columns. flexibility in combining DataFrames.

Q50. Difference between the interpolate() and fillna()

Ans: The interpolate() and fillna() methods in pandas are used to handle
missing or NaN (Not a Number) values in a DataFrame or Series. The
following table shows the difference between interpolate() and fillna():

interpolate() fillna()

Fill in the missing values based on the Fill missing values with specified
interpolation or estimate values based on values that can be based on
the existing data. some strategies.

Performs interpolation based on various Replaces NaN values with a


methods such as linear interpolation, constant like zero, mean, median,
polynomial interpolation, and time-based mode, or any other custom value
interpolation. computed from the existing data.

Applied to both numerical and DateTime Can be applied to both numerical


data when dealing with time series data or and categorical data.
when there is a logical relationship

https://www.geeksforgeeks.org/pandas-interview-questions/ 29/34
existing data.
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Conclusion
In conclusion, our Pandas Interview Questions and answers article serves as
a comprehensive guide for anyone aspiring to make a mark in the Data
Science and ML profession. With a wide range of questions from basic to
advanced, including practical coding questions, we’ve covered all the bases
to ensure you’re well-prepared for your interviews.

Remember, the key to acing an interview is not just knowing the answers,
but understanding the concepts behind them. We hope this article has been
helpful in your preparation and wish you all the best in your journey.

Stay tuned for more such resources and keep learning!

Also, Check:

Python Interview Questions


ML Interview Questions

Pandas Interview Questions – FAQs

1. Which three 3 main objects does pandas have?

The Three fundamental objects around which the whole pandas


function revolves around are Series, DataFrame , and Index.

2. Why does everyone use pandas?

Pandas allow wide range of data manipulation operations such as


merging, reshaping, selecting, as well as data cleaning, and data
wrangling features. Apart from that, Pandas is very compatible with
file-handling operation such as importing data from various file

https://www.geeksforgeeks.org/pandas-interview-questions/ 30/34
3. What is all () in pandas?
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

DataFrame.all() method checks whether all elements are True,


potentially over an axis. It returns True if all elements within a series or
along a Dataframe axis are non-zero, not-empty or not-False.

Unlock the Power of Placement Preparation!


Feeling lost in OS, DBMS, CN, SQL, and DSA chaos? Our Complete
Interview Preparation Course is the ultimate guide to conquer placements.
Trusted by over 100,000+ geeks, this course is your roadmap to interview
triumph.
Ready to dive in? Explore our Free Demo Content and join our Complete
Interview Preparation course.

GeeksforGeeks

Next Article
Pandas Interview Questions

Similar Reads
Python | pandas.to_markdown() in Pandas
With the help of pandas.to_markdown() method, we can get the markdown
table from the given dataframes by using pandas.to_markdown() method.…
1 min read

Python Pandas - pandas.api.types.is_file_like() Function


In this article, we will be looking toward the functionality of
pandas.api.types.is_file_like() from the pandas.api.types module with its…
2 min read

https://www.geeksforgeeks.org/pandas-interview-questions/ 31/34
you'll want to share data insights with someone, and using graphical…
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
4 min read

Pandas DataFrame iterrows() Method | Pandas Method


Pandas DataFrame iterrows() iterates over a Pandas DataFrame rows in the
form of (index, series) pair. This function iterates over the data frame column, …
2 min read

Pandas DataFrame interpolate() Method | Pandas Method


Python is a great language for data analysis, primarily because of the fantastic
ecosystem of data-centric Python packages. Pandas is one of those packages…
3 min read

View More Articles

Article Tags : interview-preparation interview-questions Python-pandas Interview Questions

+2 More

Practice Tags : python

A-143, 9th Floor, Sovereign Corporate


Tower, Sector-136, Noida, Uttar Pradesh -

https://www.geeksforgeeks.org/pandas-interview-questions/ 32/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Careers GfG Weekly Contest
In Media Offline Classes (Delhi/NCR)
Contact Us DSA in JAVA/C++
Advertise with us Master System Design
GFG Corporate Solution Master CP
Placement Training Program GeeksforGeeks Videos
Geeks Community

Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial

Data Science & ML Web Technologies


Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning Tutorial JavaScript
ML Maths TypeScript
Data Visualisation Tutorial ReactJS
Pandas Tutorial NextJS
NumPy Tutorial NodeJs
NLP Tutorial Bootstrap
Deep Learning Tutorial Tailwind CSS

Python Tutorial Computer Science


Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps System Design


Git High Level Design

https://www.geeksforgeeks.org/pandas-interview-questions/ 33/34
DevOps Roadmap System Design Bootcamp
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Interview Questions

School Subjects Commerce


Mathematics Accountancy
Physics Business Studies
Chemistry Economics
Biology Management
Social Science HR Management
English Grammar Finance
Income Tax

Databases Preparation Corner


SQL Company-Wise Recruitment Process
MYSQL Resume Templates
PostgreSQL Aptitude Preparation
PL/SQL Puzzles
MongoDB Company-Wise Preparation
Companies
Colleges

Competitive Exams More Tutorials


JEE Advanced Software Development
UGC NET Software Testing
UPSC Product Management
SSC CGL Project Management
SBI PO Linux
SBI Clerk Excel
IBPS PO All Cheat Sheets
IBPS Clerk

Free Online Tools Write & Earn


Typing Test Write an Article
Image Editor Improve an Article
Code Formatters Pick Topics to Write
Code Converters Share your Experiences
Currency Converter Internships
Random Number Generator
Random Password Generator

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

https://www.geeksforgeeks.org/pandas-interview-questions/ 34/34

You might also like