Top 50 Pandas Interview Questions and Answers (2024)
Top 50 Pandas Interview Questions and Answers (2024)
Top 50 Pandas Interview Questions and Answers (2024)
Pandas is not just a library, it’s an essential skill for professionals in various
domains,
Python including
Basics Interview finance,
Questions healthcare,
Python Quiz and marketing.
Popular Packages Python ProjectsThis library
Practice Python AI
streamlines data manipulation tasks, offering robust features for data
loading, cleaning, transforming, and much more. As a result, understanding
Pandas is a key requirement in many data-centric job roles.
This Panda interview question for data science covers basic and advanced
topics to help you succeed with confidence in your upcoming interviews. We
do not just cover theoretical questions, we also provide practical coding
questions to test your hands-on skills. This is particularly beneficial for
aspiring Data Scientists and ML professionals who wish to demonstrate
their proficiency in real-world problem-solving.
So, whether you are starting your journey in Python programming or looking
to brush up on your skills, “This Panda Interview Questions” is your
essential resource for acing those technical interviews.
https://www.geeksforgeeks.org/pandas-interview-questions/ 1/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
The two data structures that are supported by Pandas are Series and
DataFrames.
https://www.geeksforgeeks.org/pandas-interview-questions/ 2/34
Pandas are used for efficient data
6/24/24, 11:21 PM
analysis. The key features of Pandas are
Top 50 Pandas Interview Questions and Answers (2024)
as follows:
import pandas as pd
series = pd.Series(data)
Ans: In Pandas, a series can be created in many ways. They are as follows:
Python
# import pandas as pd
import pandas as pd
https://www.geeksforgeeks.org/pandas-interview-questions/ 3/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Output:
In order to create a series from the NumPy array, we have to import the
NumPy module and have to use the array() function.
Python
# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])
Output:
0 g
1 e
2 e
3 k
4 s
dtype: object
Python
https://www.geeksforgeeks.org/pandas-interview-questions/ 4/34
# simple array
6/24/24, 11:21 PM
data = np.array(['g', Top 50 Pandas Interview Questions and Answers (2024)
'e', 'e', 'k', 's'])
# providing an index
ser = pd.Series(data, index=[10, 11, 12, 13, 14])
print(ser)
Output:
10 g
11 e
12 e
13 k
14 s
dtype: object
We can create a series using a Python list and pass it to the Series()
constructor.
Python
# import pandas
import pandas as pd
# a simple list
list = ['g', 'e', 'e', 'k', 's']
Output:
0 g
1 e
2 e
3 k
https://www.geeksforgeeks.org/pandas-interview-questions/ 5/34
A Series can also be created from
6/24/24, 11:21 PM a Python
Top 50 Pandas Interviewdictionary. The keys
Questions and Answers (2024) of the
Python
# import pandas
import pandas as pd
# a simple dictionary
dict = {'Geeks': 10,
'for': 20,
'geeks': 30}
Output:
Geeks 10
for 20
geeks 30
dtype: int64
To create a series from a Scalar value, we must provide an index. The Series
constructor will take two arguments, one will be the scalar value and the
other will be a list of indexes. The value will repeat until all the index values
are filled.
Python
print(ser)
https://www.geeksforgeeks.org/pandas-interview-questions/ 6/34
0 10
6/24/24, 11:21 PM
1 10 Top 50 Pandas Interview Questions and Answers (2024)
2 10
3 10
4 10
5 10
dtype: int64
Python
Output:
0 3.0
1 18.0
2 33.0
dtype: float64
0 0.694519
1 0.782243
2 0.082820
dtype: float64
https://www.geeksforgeeks.org/pandas-interview-questions/ 7/34
import pandas as pd
6/24/24, 11:21 PM print(pd.Series(range(5)))
Top 50 Pandas Interview Questions and Answers (2024)
Output:
0 0
1 1
2 2
3 3
4 4
dtype: int64
Here, we will use the Python list comprehension technique to create a series
in Pandas. We will use the range function to define the values and a for loop
for indexes.
Python
# import pandas
import pandas as pd
ser = pd.Series(range(1, 20, 3),
index=[x for x in 'abcdefg'])
print(ser)
Output:
a 1
b 4
c 7
d 10
e 13
f 16
g 19
dtype: int64
https://www.geeksforgeeks.org/pandas-interview-questions/ 8/34
as follows:
6/24/24, 11:21 PM
Shallow Copy is a copy of theTopseries
50 Pandas Interview Questions and Answers (2024)
object where the indices and the data
of the original object are not copied. It only copies the references to the
indices and data. This means any changes made to a series will be reflected
in the other. A shallow copy of the series can be created by writing the
following syntax:
ser.copy(deep=False)
Deep Copy is a copy of the series object where it has its own indices and
data. This means nay changes made to a copy of the object will not be
reflected tot he original series object. A deep copy of the series can be
created by writing the following syntax:
ser.copy(deep=True)
The default value of the deep parameter of the copy() function is set to True.
import pandas as pd
dataframe = pd.DataFrame(data)
https://www.geeksforgeeks.org/pandas-interview-questions/ 9/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
# import pandas as pd
import pandas as pd
Output:
Empty DataFrame
Columns: []
Index: []
In order to create a DataFrame from a Python list, just pass the list to the
DataFrame() constructor.
Python
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
Output:
0
0 Geeks
1 For
2 Geeks
3 is
4 portal
https://www.geeksforgeeks.org/pandas-interview-questions/ 10/34
A DataFrame can be created from
6/24/24, 11:21 PM a Python
Top 50 Pandas Interviewlist of lists
Questions and passed
and Answers (2024) the main
list to the DataFrame() constructor along with the column names.
Python
# import pandas as pd
import pandas as pd
# list of strings
lst = [[1, 'Geeks'], [2, 'For'], [3, 'Geeks']]
Output:
Id Data
0 1 Geeks
1 2 For
2 3 Geeks
Python
import pandas as pd
https://www.geeksforgeeks.org/pandas-interview-questions/ 11/34
1 nick 21
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
2 krish 19
3 jack 18
Python
# import pandas as pd
import pandas as pd
# list of strings
lst = [{1: 'Geeks', 2: 'For', 3: 'Geeks'},
{1: 'Portal', 2: 'for', 3: 'Geeks'}]
Output:
1 2 3
0 Geeks For Geeks
1 Portal for Geeks
Python
# import pandas as pd
import pandas as pd
# list of strings
lst = pd.Series(['Geeks', 'For', 'Geeks'])
https://www.geeksforgeeks.org/pandas-interview-questions/ 12/34
0
6/24/24, 11:21 PM
0 Geeks Top 50 Pandas Interview Questions and Answers (2024)
1 For
2 Geeks
Ans: We can create a data frame from a CSV file – “Comma Separated
Values”. This can be done by using the read_csv() method which takes the
csv file as the parameter.
pandas.read_csv(file_name)
Another way to do this is by using the read_table() method which takes the
CSV file and a delimiter value as the parameter.
pandas.read_table(file_name, deliniter)
Ans: The first few records of a dataframe can be accessed by using the
pandas head() method. It takes one optional argument n, which is the
number of rows. By default, it returns the first 5 rows of the dataframe. The
head() method has the following syntax:
df.head(n)
df.iloc[:n]
Ans: Reindexing in Pandas as the name suggests means changing the index
of the rows and columns of a dataframe. It can be done by using the Pandas
https://www.geeksforgeeks.org/pandas-interview-questions/ 13/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Ans: There are many ways to Select a single column of a dataframe. They
are as follows:
Dataframe.column_name
DataFrame[column_name]
DataFrame.rename(columns={'column1': 'COLUMN_1',
'column2':'COLUMN_2'}, inplace=True)
Another way is by using the set_axis() function which takes the new column
name and axis to be replaced with the new name.
DataFrame.set_axis(labels=['COLUMN_1','COLUMN_2'], axis=1,
inplace=True)
In case we want to add a prefix or suffix to the column names, we can use
the add_prefix() or add_suffix() methods.
DataFrame.add_prefix(prefix='PREFIX_')
DataFrame.add_suffix(suffix='_suffix')
https://www.geeksforgeeks.org/pandas-interview-questions/ 14/34
index of a dataframe. The set_index() method has the following syntax:
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Adding Rows
The df.loc[] is used to access a group of rows or columns and can be used to
add a row to a dataframe.
DataFrame.loc[Row_Index]=new_row
pandas.concat([Dataframe1,Dataframe2])
Adding Columns
DataFrame[data] = list_of_values
DataFrame.assign(**kwargs)
https://www.geeksforgeeks.org/pandas-interview-questions/ 15/34
To delete a row
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
DataFrame.drop([Row_Index_Number], axis=0)
Ans: We can set the index to a Pandas dataframe by using the set_index()
method, which is used to set a list, series, or dataframe as the index of a
dataframe.
DataFrame.set_index('Column_Name')
Ans: The index of Pandas dataframes can be reset by using the reset_index()
method. It can be used to simply reset the index to the default integer index
beginning at 0.
DataFrame.reset_index(inplace = True)
Ans: Pandas dataframe.corr() method is used to find the correlation of all the
columns of a dataframe. It automatically ignores any missing or non-
numerical values.
DataFrame.corr()
Ans: There are various ways to iterate the rows and columns of a dataframe.
https://www.geeksforgeeks.org/pandas-interview-questions/ 16/34
remaining values are the row values.
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Iteration over Columns
Ans: Iterating is not the best option when it comes to Pandas Dataframe.
Pandas provides a lot of functions using which we can perform certain
operations instead of iterating through the dataframe. While iterating a
dataframe, we need to keep in mind the following things:
https://www.geeksforgeeks.org/pandas-interview-questions/ 17/34
DataFrame.to_excel(file_name)
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Dataframe.to_numpy()
https://www.geeksforgeeks.org/pandas-interview-questions/ 18/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Ans: Boolean masking is a technique that can be used in Pandas to split a
DataFrame depending on a boolean criterion. You may divide different
regions of the DataFrame and filter rows depending on a certain criterion
using boolean masking.
Ans: Time series is a collection of data points with timestamps. It depicts the
evolution of quantity over time. Pandas provide various functions to handle
time series data efficiently. It is used to work with data timestamps,
resampling time series for different time periods, working with missing data,
slicing the data using timestamps, etc.
DataFrame.set_index('Date',
inplace=True)
Set ‘Date’ as the index
https://www.geeksforgeeks.org/pandas-interview-questions/ 19/34
DataFrame.interpolate()
linear interpolation
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Ans: The time delta is the difference in dates and time. Similar to the
timedelta() object in the datetime module, a Timedelta in Pandas indicates
the duration or difference in time. For addressing time durations or time
variations in a DataFrame or Series, Pandas has a dedicated data type.
The time delta object can be created by using the timedelta() method and
providing the number of weeks, days, seconds, milliseconds, etc as the
parameter.
With the help of the Timedelta data type, you can easily perform arithmetic
operations, comparisons, and other time-related manipulations. In terms of
different units, such as days, hours, minutes, seconds, milliseconds, and
microseconds, it can give durations.
https://www.geeksforgeeks.org/pandas-interview-questions/ 20/34
'count'})
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Ans: The following table shows the difference between merge() and
concat():
.merge() concat()
Ans: The map(), applymap(), and apply() methods are used in pandas for
applying functions or transformations to elements in a DataFrame or Series.
The following table shows the difference between map(), applymap() and
apply():
https://www.geeksforgeeks.org/pandas-interview-questions/ 21/34
Defined only in Dataframe
Series and DataFrame
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Used to apply a
Used to apply a
function or a Used to apply a function to
function along a
dictionary to each each element of the
specific axis of the
element of the DataFrame.
DataFrame or Series.
Series.
Series.map() works
DataFrame.applymap() DataFrame.apply()
element-wise and
works element-wise, works on either entire
can be used to
applying the provided rows or columns
perform element-
function to each element in element-wise of a
wise transformations
the DataFrame. Dataframe or Series
or mappings.
Ans: Both pivot_table() and groupby() are powerful methods in pandas used
for aggregating and summarizing data. The following table shows the
difference between pivot_table() and groupby():
pivot_table() groupby()
https://www.geeksforgeeks.org/pandas-interview-questions/ 22/34
6/24/24, 11:21 PM Top 50ItPandas
performs
Interviewgrouping based
Questions and Answerson column
(2024) values
It can handle multiple levels of
and creates a GroupBy object then
grouping and aggregation,
aggregation functions, such as sum, mean,
providing flexibility in
count, etc., can be applied to the grouped
summarizing data.
data.
Using Pandas.to_datetime()
Python
import pandas as pd
Output:
https://www.geeksforgeeks.org/pandas-interview-questions/ 23/34
Python
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Output:
2023-07-17 00:00:00
DataFrame.describe()
Ans: The mean, median, mode, Variance, Standard Deviation, and Quantile
range can be computed using the following commands in Python.
https://www.geeksforgeeks.org/pandas-interview-questions/ 24/34
so that a machine learning model can fit it. To apply label encoding using
pandas we can use the pandas.Categorical().codes
6/24/24, 11:21 PM Top 50 Pandas Interview Questions or
and pandas.factorize()
Answers (2024)
DataFrame.boxplot(column='Col_Name', grid=False)
DataFrame['Numerical_Col_Name'].plot.hist()
https://www.geeksforgeeks.org/pandas-interview-questions/ 25/34
the dataframe. we can also sort it by multiple columns. To sort it in
descending order, we pass anTop
6/24/24, 11:21 PM additional parameter
50 Pandas Interview ‘ascending’
Questions and Answers (2024) and set it to
False.
DataFrame.sort_values(by='Age',ascending=True)
DataFrame.duplicated()
DataFrame.drop_duplicates()
Ans: Generally dataset has some missing values, and it can happen for a
variety of reasons, such as data collection issues, data entry errors, or data
not being available for certain observations. This can cause a big problem.
To handle these missing values Pandas provides various functions. These
functions are used for detecting, removing, and replacing null values in
Pandas DataFrame:
isnull(): It returns True for NaN values or null values and False for
present values
notnull(): It returns False for NaN values and True for present values
dropna(): It analyzes and drops Rows/Columns with Null values
fillna(): It let the user replace NaN values with some value of their own
https://www.geeksforgeeks.org/pandas-interview-questions/ 26/34
Q45. What is groupby() Function
6/24/24, 11:21 PM in Pandas?
Top 50 Pandas Interview Questions and Answers (2024)
DataFrame.groupby(by=['Col_name'])
Python
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6]},
index=[10, 20, 30])
https://www.geeksforgeeks.org/pandas-interview-questions/ 27/34
A B C D
6/24/24, 11:21 PM
20 2 5 7 10 Top 50 Pandas Interview Questions and Answers (2024)
30 3 6 8 11
Ans: The iloc() and loc() functions of pandas are used for accessing data
from a DataFrame.The following table shows the difference between iloc()
and loc():
iloc() loc()
It is an indexed-based selection
It is labelled based selection method
method
It allows you to access rows and It allows you to access rows and
columns of a DataFrame by their columns of a DataFrame using their
integer positions labels or names.
Syntax: Syntax:
DataFrame.iloc[row_index, DataFrame.loc[row_label,
column_index] column_label]
https://www.geeksforgeeks.org/pandas-interview-questions/ 28/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
join() merge()
Does not support merging based Supports merging based on one or more
on column values or multiple columns or indexes, allowing for more
columns. flexibility in combining DataFrames.
Ans: The interpolate() and fillna() methods in pandas are used to handle
missing or NaN (Not a Number) values in a DataFrame or Series. The
following table shows the difference between interpolate() and fillna():
interpolate() fillna()
Fill in the missing values based on the Fill missing values with specified
interpolation or estimate values based on values that can be based on
the existing data. some strategies.
https://www.geeksforgeeks.org/pandas-interview-questions/ 29/34
existing data.
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Conclusion
In conclusion, our Pandas Interview Questions and answers article serves as
a comprehensive guide for anyone aspiring to make a mark in the Data
Science and ML profession. With a wide range of questions from basic to
advanced, including practical coding questions, we’ve covered all the bases
to ensure you’re well-prepared for your interviews.
Remember, the key to acing an interview is not just knowing the answers,
but understanding the concepts behind them. We hope this article has been
helpful in your preparation and wish you all the best in your journey.
Also, Check:
https://www.geeksforgeeks.org/pandas-interview-questions/ 30/34
3. What is all () in pandas?
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
GeeksforGeeks
Next Article
Pandas Interview Questions
Similar Reads
Python | pandas.to_markdown() in Pandas
With the help of pandas.to_markdown() method, we can get the markdown
table from the given dataframes by using pandas.to_markdown() method.…
1 min read
https://www.geeksforgeeks.org/pandas-interview-questions/ 31/34
you'll want to share data insights with someone, and using graphical…
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
4 min read
+2 More
https://www.geeksforgeeks.org/pandas-interview-questions/ 32/34
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Careers GfG Weekly Contest
In Media Offline Classes (Delhi/NCR)
Contact Us DSA in JAVA/C++
Advertise with us Master System Design
GFG Corporate Solution Master CP
Placement Training Program GeeksforGeeks Videos
Geeks Community
Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial
https://www.geeksforgeeks.org/pandas-interview-questions/ 33/34
DevOps Roadmap System Design Bootcamp
6/24/24, 11:21 PM Top 50 Pandas Interview Questions and Answers (2024)
Interview Questions
https://www.geeksforgeeks.org/pandas-interview-questions/ 34/34