Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Pandas Basic Interview Questions
2.1.
1.  Explain Python Pandas?
2.2.
2. What is the use of Python Pandas?
2.3.
3. Define series in Pandas?
2.4.
4. What are the types of data structures available in Pandas?
2.5.
5. What are the critical features of Pandas?
2.6.
6.How can the standard deviation be calculated from the Series?
2.7.
7. Define DataFrames in Pandas?
2.8.
8. What is the time series in Pandas?
2.9.
9. Explain reindexing in Pandas?
2.10.
10.  Explain MultiIndexing in Pandas.
2.11.
11.  What is TimeDelta?
2.12.
12. How to create a series from a dictionary in Pandas?
2.13.
13.  Which library tool is used to create a scatter plot matrix?
3.
Pandas Interview Questions for Intermediate
3.1.
14. What is Time Offset?
3.2.
15. Explain Categorical data in Pandas?
3.3.
16. How to create a copy of the series in Pandas?
3.4.
17. How to add an index in DataFrames?
3.5.
18. How to delete an index from DataFrame?
3.6.
19. How can I remove indices, rows, or columns from a Pandas DataFrame?
3.7.
20. How to rename an index or column in a DataFrame?
3.8.
21. How to add a column to DataFrame?
3.9.
22. How to add rows to a DataFrame?
3.10.
23. How to iterate over a Pandas DataFrame?
3.11.
24. What is Pandas NumPy array?
3.12.
25. How to convert DataFrame to NumPy array?
3.13.
26. List some statistical functions in Python Pandas?
3.14.
27. How can one identify the items in series A that are absent from series B?
3.15.
28. How can we convert DataFrame to an excel file?
4.
Pandas Interview Questions for Experienced
4.1.
29. What is data aggregation?
4.2.
30. What is Multiple Indexing?
4.3.
31. What is concat() in Pandas?
4.4.
32. What is GroupBy() in Pandas?
4.5.
33. How to sort the DataFrame?
4.6.
34. How can one create an empty DataFrame in Pandas?
4.7.
35. How to split a DataFrame based on boolean criteria?
4.8.
36. What do describe() percentiles values represent?
4.9.
37. How can you merge data on common columns or indices?
4.10.
38. How to write DataFrame to PostgreSQL table?
4.11.
39. How to convert continuous values to discrete values in Pandas?
4.12.
40. How are iloc() and loc() different?
4.13.
41. Explain the difference between join() and merge() in Pandas?
4.14.
42.  Explain the difference(s) between merge() and concat() in Pandas?
4.15.
43.  Explain the difference between interpolate() and fillna() in Pandas?
4.16.
44. What are the types of conversion methods in Pandas?
5.
Pandas Coding Interview Questions
5.1.
45. How do you select specific columns from a DataFrame in Pandas?
5.2.
46. How can you handle missing data in a Pandas DataFrame?
5.3.
47. How do you filter rows based on a condition in Pandas?
5.4.
48. What is the difference between loc and iloc in Pandas?
5.5.
49. How do you group data in a DataFrame by a specific column?
5.6.
50. How can you merge two DataFrames in Pandas?
5.7.
51. How do you sort a DataFrame by multiple columns in Pandas?
5.8.
52. How do you reset the index of a DataFrame?
5.9.
53. How can you create a DataFrame from a dictionary in Pandas?
5.10.
54. How do you apply a function to each element of a DataFrame?
6.
Pandas Interview Questions for Data Scientists
6.1.
55. How would you handle large datasets in Pandas?
6.2.
56. Explain the use of the pivot_table() function in Pandas.
6.3.
57. How do you handle categorical data in Pandas?
6.4.
58. What is the purpose of the agg() function in Pandas?
6.5.
59. How would you optimize Pandas operations for performance?
6.6.
60. How do you merge datasets with different keys in Pandas?
6.7.
61. What is the significance of the apply() function in Pandas?
6.8.
62. How would you detect and remove duplicate rows in a Pandas DataFrame?
6.9.
63. How do you concatenate DataFrames vertically and horizontally in Pandas?
6.10.
64. How would you check for and handle outliers in a Pandas DataFrame?
7.
Pandas MCQ Questions
7.1.
65. Which function is used to read a CSV file into a DataFrame in Pandas?
7.2.
66. What does the head() function do in Pandas?
7.3.
67. How do you change the index of a DataFrame?
7.4.
68. Which method is used to fill missing values in a DataFrame?
7.5.
69. What type of join is performed by default when using the merge() function?
7.6.
70. How do you check the data type of each column in a DataFrame?
7.7.
71. Which of the following is used to remove rows with missing data?
7.8.
72. What does the describe() function in Pandas provide?
7.9.
73. How do you concatenate two DataFrames vertically?
7.10.
74. What does the isin() function do in Pandas?
8.
Conclusion
Last Updated: Sep 2, 2024
Medium

Pandas Interview Questions

Author Manish Kumar
1 upvote

Introduction

Pandas is software that helps people analyze and change data. It has lots of tools and things to use that can help scientists and engineers work with data. This article talks about some common questions people might get asked about Pandas in a job interview. 

Top Pandas Interview Questions

In this article, we are going to explore most commonly asked Pandas Interview Questions and Answers which are divided into the following sections:

  • Pandas Basic Interview Questions
  • Pandas Interview Questions for Intermediate
  • Pandas Interview Questions for Experienced
  • Pandas Coding Interview Questions
  • Pandas Interview Questions for Data Scientists
  • Pandas MCQ Questions

Pandas Basic Interview Questions

This section will get the basic pandas interview questions to build a solid foundation. This section is crucial since it establishes a strong base. 

1.  Explain Python Pandas?

Ans: Python Pandas is a data analysis and manipulation software library built by Wes McKinney. It is an open-source, cross-platform library. It provides data structures and procedures for numerical and time series data manipulation. It makes machine learning algorithms easy to implement.

2. What is the use of Python Pandas?

Ans: It is used for data analysis, time series manipulation, and table management. It is specially designed for the Python programming language.

3. Define series in Pandas?

Ans: It is a one-dimensional array of objects of any data type. Using the 'series'

method, you can convert any list, tuple, and dictionary into a series. A series cannot have a column. The row labels of the series are called indexes.

4. What are the types of data structures available in Pandas?

Ans: Pandas provides two types of data structures built on top of NumPy. These are

  • series and DataFrames.
  • Series are one-dimensional, whereas DataFrames are two-dimensional data types.

5. What are the critical features of Pandas?

Ans: The features of Pandas library are:

  • Time Series
  • Data Alignment
  • Merge and Join
  • Reshaping
  • Memory efficient

6.How can the standard deviation be calculated from the Series?

Ans: In pandas, you can calculate the standard deviation of a Series using the .std() method. For example:

import pandas as pd
data = pd.Series([1, 2, 3, 4, 5])
std_deviation = data.std()


Here, std_deviation will contain the standard deviation of the data in the Series.

7. Define DataFrames in Pandas?

Ans: A DataFrame is an extensively used data structure in Pandas and works with 2-D arrays with labelled axes. It is a standard storing data with row and column indices. The columns can store heterogeneous data such as int and bool. It can be viewed as a dictionary of series data structures.
 

8. What is the time series in Pandas?

Ans: Time series is an organised collection of data points showing a quantity's evolution over time. Pandas are extremely capable and have the tools to work with time series data from various fields.
Functions provided by Pandas:

  • Create date and time sequences using preset frequencies
  • Date and time manipulation supported by timezone feature
  • Conversion of time series to a given frequency or to resample 
  • Analysing time series data from several sources
  • Calculating date and time in absolute or relative terms 

9. Explain reindexing in Pandas?

Ans: Reindexing allows the assignment of new indices and has configurable filling logic. It injects NA/NaN in the areas where the elements are missing from the last index. It returns an object unless the new index is equivalent to the current one, and the value of the copy becomes false. It is used to alter the index of the rows and columns of the DataFrame.

10.  Explain MultiIndexing in Pandas.

Ans: MultiIndexing in Pandas allows us to have multi-levels of row and column labels which provide a way to analyze and represent data. With the help of MultiIndexing, one can organize the data in a tabular format with multiple features.

11.  What is TimeDelta?

Ans: TimeDelta is a data type in Python. It represents the duration or difference between two points in time. TimeDelta is mainly used to perform arithmetic operations involving dates and times. It can be positive or negative and can store values for days, seconds, minutes, hours, and weeks.

12. How to create a series from a dictionary in Pandas?

Ans: The Series() method is used without the index parameter to create a series.

13.  Which library tool is used to create a scatter plot matrix?

Ans: Scatter_matrix is used for this purpose.

Pandas Interview Questions for Intermediate

We discussed some of the easy-level Pandas Interview Questions. Let us now go through some of the medium-level Pandas Interview Questions. 

14. What is Time Offset?

Ans: A pandas Series or DataFrame can be shifted or offset by using a time offset, which is a relative period of time. The representation of time spans like days, weeks, months, and years can be done using time offsets.

The pandas.offsets module can be used to produce time offsets. A range of pre-defined time offsets, including Day(), Week(), Month(), and Year(), are available in the pandas.offsets module. By mixing the pre-defined time offsets, you can also produce custom time offsets.

15. Explain Categorical data in Pandas?

Ans: A Categorical data is a Pandas data type corresponding to a categorical variable in statistics. A categorical variable usually takes a limited and fixed number of values. All values of categorical data are in categories, and np. Nan.

16. How to create a copy of the series in Pandas?

Ans: To create a copy of the series, use the following code snippet:
             pandas.Series.copy

Series.copy(deep=True)
The above code creates a deep copy that includes a copy of data and indices. It will not copy data or indices if deep is set to false.

17. How to add an index in DataFrames?

Ans: While creating a DataFrame, you can add inputs to the index argument. It will ensure that you have the required index. If you don't specify inputs, the DataFrame, by default, contains a numerical index that starts with zero and ends on the last row of the DataFrame.

18. How to delete an index from DataFrame?

Ans: First, reset the index of DataFrame and then execute the following command to remove the index name.
del df.index.name
Remove duplicate index values and drop the identical values from the index column.

19. How can I remove indices, rows, or columns from a Pandas DataFrame?

Ans: Use the drop() method to eliminate indices, rows, or columns from a Pandas DataFrame. The DataFrame's related indices, rows, or columns are eliminated by the drop() method, which accepts a list of labels as an input.

20. How to rename an index or column in a DataFrame?

Ans: We can use the .rename method to change the index and columns name.

21. How to add a column to DataFrame?

Ans: You can add new columns to the existing DataFrame. Follow the code snippet below to add a column :

#CODING NINJAS
# importing the pandas library as pd   
import pandas as pd      
info = {'one' : pd.Series([21, 12, 33, 14, 51], index=['a', 'b', 'c', 'd', 'e']),    
            'two' : pd.Series([18, 32, 39, 48, 45, 56], index=['a', 'b', 'c', 'd', 'e', 'f'])}
info = pd.DataFrame(info)            
print ("Passing a series to add new column")    
info['three']=pd.Series([89,65,67],index=['a','b','c'])    
print (info)    
print ("Add new column using previous columns")    
info['four']=info['one']+info['three']    
print (info)
You can also try this code with Online Python Compiler
Run Code

22. How to add rows to a DataFrame?

Ans: You can use .loc, iloc and ix to add new rows to a DataFrame.
loc work for labels of the index, iloc works for the position and ix requires a label to be passed to it if it is integer based.

23. How to iterate over a Pandas DataFrame?

Ans: To iterate over the rows of the DataFrame, use loop with the iterrows() method.

24. What is Pandas NumPy array?

Ans: NumPy extends to Numerical Python. Calculations in NumPy arrays are faster than in regular Python arrays. It is a Python package to perform various analyses and process single-dimensional and multidimensional array elements.

25. How to convert DataFrame to NumPy array?

Ans: DataFrames can be converted to NumPy arrays to perform high-level mathematical computations. You can use DataFrame.to_numpy() method for conversion. This function will return a NumPy array.

26. List some statistical functions in Python Pandas?

Ans: Below are some statistical functions in Python Pandas.

  • mean(): This function computes the arithmetic mean along a specified axis.
  • median(): This function calculates the median along a specified axis.
  • mode(): This function calculates the mode() along a specified axis.
  • sum(): This function finds the sum of values along a specified axis.
  • var(): It calculates the variance along a specified axis.

27. How can one identify the items in series A that are absent from series B?

Ans: You can use the steps below to find the series A items that are missing from series B:

  • Make a collection of the series B items. The set() function can be used to accomplish this.
     
  • Add the series B set of items to the series A group of things. To accomplish this, use the - operator.
     
  • The items in series A that are missing from series B will make up the resulting set.

28. How can we convert DataFrame to an excel file?

Ans: We can use the to_excel() function to convert a DataFrame to an excel file using Pandas in Python. To use this function, you can simply provide the DataFrame and the desired file name as an argument.

Pandas Interview Questions for Experienced

This section will discuss some of the more challenging Pandas Interview Questions. While knowing easy and medium-level questions is necessary, the more complicated questions will set you above other candidates in the interview. Let us go through some of the more difficult Pandas Interview Questions.

29. What is data aggregation?

Ans: The main work of data aggregation is to apply some assembly to one or more columns. It uses sum to return the sum of the values, min to return the minimum, and max to return the maximum value for the requested axis.

30. What is Multiple Indexing?

Ans: A method for indexing a Pandas DataFrame with numerous layers is multiple indexing, commonly referred to as hierarchical indexing. As a result, you can design indexes with numerous dimensions, including those for data with time series, locations, or categories.

31. What is concat() in Pandas?

Ans: The .concat() method stacks multiple DataFrames vertically or connects them horizontally after aligning them on an index.

32. What is GroupBy() in Pandas?

Ans: The GroupBy() functions’ main task is to split the data into various groups. It allows rearranging the data by utilising them in real-world data sets.

33. How to sort the DataFrame?

Ans: To sort the DataFrame use the DataFrame.sort_values() function. It sorts the DataFrame row or column-wise. The important parameters of the sort function are:

  • axis: specifies whether to sort for rows (0) or columns (1)
  • by: specifies which column or rows determine sorting
  • ascending: specifies whether to sort the DataFrame in ascending or descending order

34. How can one create an empty DataFrame in Pandas?

Ans: You can use the pd.DataFrame() function in Pandas without giving any arguments to build a blank DataFrame. A DataFrame without any rows or columns will result from this.

Here is an example of how to make a blank Python DataFrame:

import pandas as pd
# Create an empty DataFrame
df = pd.DataFrame()
# Print the DataFrame
print(df)

35. How to split a DataFrame based on boolean criteria?

Ans: To split the DataFrame, first create a mask to separate the data frame and then use the (~) inverse operator to take the complement of the mask.

36. What do describe() percentiles values represent?

Ans: The percentiles describe the data distribution we are working on. The median is represented by 50, whereas the lower and upper borders are at 25 and 75, respectively. Using this, we can get a clearer idea of how skewed is our data.

37. How can you merge data on common columns or indices?

Ans: To merge, use the .merge() method which is similar to database-style joins. We have the inner, outer, left and right merge operations. An inner merge merges left and right data frames keeping only the common values. Left and right merge operations keep all the rows from their side and add empty / Nan values on the missing opposite side. An outer merge returns all the rows from the left and right sides.

38. How to write DataFrame to PostgreSQL table?

Ans: You will have to use the to_sql module, create an SQLAlchemy engine, and then write DataFrame to the SQL table.

39. How to convert continuous values to discrete values in Pandas?

Ans: You will have to use either cut() or qcut() functions:

  • cut() bins the data on values. We use it when we need evenly spaced values in bins. This function will use values rather than frequencies to sort the data.
  • qcut() bins the data based on sample quantities. We use it to study data by quantities. It will divide an equal number of data in each bin.

40. How are iloc() and loc() different?

Ans: The major difference between iloc() and loc() is that the iloc() function is used for selecting data based on integer-based indexing. While loc() is used to select data based on label-based indexing.

41. Explain the difference between join() and merge() in Pandas?

Ans: The major difference between join() and merge() in Pandas is below.

join(): It is a method for combining DataFrames based on their indexes. Left join is the default join and it is a convenient way to merge DataFrames.

merge(): This allows merging DataFrames based on specified column values. It supports inner, outer left, and right joins. It can merge DataFrames on one or more columns based on common values to combine the data.

42.  Explain the difference(s) between merge() and concat() in Pandas?

Ans: The major difference between merge() and concat() in Pandas is below.

merge(): It combines DataFrames based on common columns and performs various joins such as inner, outer, right, and left.

concat(): This function concatenates DataFrames along with a particular axis. It provide no relationship between the data in the DataFrames.

43.  Explain the difference between interpolate() and fillna() in Pandas?

Ans: The major difference between interpolate() and fillna() in Pandas is below.

interpolate(): It is the method that is used to fill missing values in DataFrame by estimating values based on existing data.

fillna(): Maily fillna() is used to replace missing data or values with the appropriate values.

44. What are the types of conversion methods in Pandas?

Ans: The conversion methods are:

  • to_numeric() - converts non numeric to numeric type
  • astype() - converts any type to any other type, it can also convert to    categorical types
  • convert_dtypes() - converts DataFrames to best dtype 
  • infer_objects() - a utility method to convert object columns holding Python objects to a pandas type if possible

Pandas Coding Interview Questions

45. How do you select specific columns from a DataFrame in Pandas?

To select specific columns from a DataFrame, you can use the column names inside double square brackets. For instance, if you have a DataFrame named df and you want to select columns A and B, you would write df[['A', 'B']]. This method allows you to retrieve multiple columns simultaneously. This approach is highly efficient, especially when working with large datasets where selecting only the necessary columns can improve performance. Additionally, you can use the loc method to achieve similar results with more flexibility, allowing for both label-based row and column selection.

46. How can you handle missing data in a Pandas DataFrame?

Handling missing data is crucial for maintaining data integrity in any analysis. In Pandas, you can manage missing data by using methods like dropna() and fillna(). The dropna() method allows you to remove rows or columns with missing values, which is useful when you prefer not to impute any data. On the other hand, fillna() enables you to fill missing values with a specific value, such as the mean, median, or a constant. This method is beneficial when you want to retain the data structure while addressing the gaps. You can also use forward-fill (ffill) or backward-fill (bfill) techniques to propagate the last valid observation forward or backward.

47. How do you filter rows based on a condition in Pandas?

Filtering rows based on a condition in Pandas involves using Boolean indexing. For example, to filter rows where the values in column A are greater than 10, you would write df[df['A'] > 10]. This operation returns a DataFrame containing only the rows that meet the specified condition. Boolean indexing is powerful because it allows you to combine multiple conditions using logical operators like & (and), | (or), and ~ (not). For instance, to filter rows where A > 10 and B < 5, you would write df[(df['A'] > 10) & (df['B'] < 5)]. This flexibility is essential for complex data analysis tasks.

48. What is the difference between loc and iloc in Pandas?

The primary difference between loc and iloc lies in how they access DataFrame elements. loc is label-based, meaning it uses the row and column labels (or names) for selection. For example, df.loc[1, 'A'] retrieves the value in the second row of column A. 

On the other hand, iloc is index-based and uses integer positions to access data. So, df.iloc[1, 0] would select the element at the second row and the first column, regardless of their labels. Understanding this distinction is vital when working with DataFrames, especially when indices are not sequential integers or when you need precise control over data selection.

49. How do you group data in a DataFrame by a specific column?

Grouping data in Pandas is done using the groupby() function, which allows you to aggregate data based on one or more columns. For example, if you have a sales DataFrame and want to group the data by the Region column, you would use df.groupby('Region'). 

After grouping, you can apply aggregation functions like sum(), mean(), or count() to summarize the data within each group. This is particularly useful for analyzing trends or patterns across different categories. The groupby() function is highly flexible, supporting complex operations like grouping by multiple columns or performing custom aggregations using the agg() method.

50. How can you merge two DataFrames in Pandas?

Merging two DataFrames in Pandas can be done using the merge() function, which combines DataFrames based on common columns or indices. You can specify the type of join to perform, such as inner, left, right, or outer, depending on how you want to align the data. 

For instance, an inner join will return only the rows with matching values in both DataFrames, while a left join will include all rows from the left DataFrame and matching rows from the right. The merge() function is versatile, allowing for complex merges with multiple keys, suffixes for overlapping column names, and even handling of duplicate keys.

51. How do you sort a DataFrame by multiple columns in Pandas?

Sorting a DataFrame by multiple columns can be achieved using the sort_values() method, where you pass a list of column names to the by parameter. For example, df.sort_values(by=['A', 'B']) sorts the DataFrame first by column A and then by column B. 

You can control the sort order for each column by specifying the ascending parameter, which accepts a list of Boolean values. This method allows for complex sorting criteria, enabling you to prioritize certain columns over others. Additionally, sort_values() supports sorting by index if needed, making it a powerful tool for organizing data.

52. How do you reset the index of a DataFrame?

Resetting the index of a DataFrame is done using the reset_index() method, which reassigns a default integer index to the DataFrame. This is useful when you’ve performed operations that altered the original index, such as filtering or grouping, and you want to restore a simple integer index. By default, reset_index() adds the old index as a column, but you can remove it by passing drop=True. This method is especially handy when you need to reformat your DataFrame after a series of transformations and prepare it for further analysis or exporting.

53. How can you create a DataFrame from a dictionary in Pandas?

Creating a DataFrame from a dictionary is straightforward with the pd.DataFrame() constructor. The keys of the dictionary represent the column names, and the values are the data for each column. For example, pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) creates a DataFrame with two columns, A and B. This method is highly flexible, allowing for dictionaries with lists, arrays, or even other dictionaries as values. If the values have different lengths, Pandas will automatically align them based on the index, filling missing values with NaN. This feature makes it easy to convert structured data into a DataFrame for analysis.

54. How do you apply a function to each element of a DataFrame?

To apply a function to each element of a DataFrame, you can use the applymap() method. This method applies a function element-wise, meaning it processes each individual value in the DataFrame. For example, if you want to multiply every element in the DataFrame by 2, you would use df.applymap(lambda x: x * 2). The applymap() method is particularly useful for performing operations that need to be applied uniformly across all elements, such as data cleaning, transformations, or calculations. It offers a simple yet powerful way to manipulate data at the most granular level in Pandas.

Pandas Interview Questions for Data Scientists

55. How would you handle large datasets in Pandas?

Handling large datasets in Pandas can be challenging due to memory constraints and processing speed. To manage this, you can read data in chunks using the chunksize parameter in functions like read_csv(). This allows you to process the data in smaller, manageable portions rather than loading the entire dataset into memory. Additionally, optimizing data types using the dtype parameter can significantly reduce memory usage. For extremely large datasets, integrating Pandas with Dask or using a database like SQLite can help, as these tools are designed to handle large-scale data processing more efficiently.

56. Explain the use of the pivot_table() function in Pandas.

The pivot_table() function in Pandas is a powerful tool for summarizing and analyzing data. It allows you to reshape and aggregate data in a DataFrame, similar to Excel’s pivot tables. You can group data by one or more keys and apply aggregation functions such as mean, sum, count, etc., to derive insights. For example, you can create a pivot table to calculate the average sales by region and product category. The pivot_table() function offers extensive customization, including handling missing values, defining margins, and setting multi-level indexes, making it an essential feature for data analysis in Pandas.

57. How do you handle categorical data in Pandas?

Categorical data in Pandas can be efficiently managed using the Categorical data type. Converting a column to a categorical type reduces memory usage and speeds up operations, as categorical data is stored more compactly than string data. You can also define the categories and their order, which is useful for sorting or comparisons. Additionally, categorical data can be used for group-by operations and in plotting to ensure the correct ordering of categories. Handling categorical data properly is essential for accurate statistical analysis and visualization in data science tasks.

58. What is the purpose of the agg() function in Pandas?

The agg() function in Pandas is used to apply multiple aggregate functions to a DataFrame or a Series. This function allows you to specify a list of functions to be applied to each column. For example, 

df.agg(['mean', 'sum']) 


will calculate the mean and sum for each numerical column in the DataFrame. You can also apply different functions to different columns by passing a dictionary: 

df.agg({'column1': 'mean', 'column2': 'sum'}). 


The agg() function is particularly useful when you need to perform multiple aggregations at once, making it easier to summarize and explore your data.

59. How would you optimize Pandas operations for performance?

Optimizing Pandas operations for performance involves several techniques. First, you can use vectorized operations instead of applying functions row by row, as vectorization is faster and more efficient. Second, reducing memory usage by specifying appropriate data types for your columns can improve performance, especially with large datasets. Third, leveraging methods like apply() with lambda functions or using built-in functions directly can reduce the overhead associated with custom functions. Additionally, for complex operations, consider using Dask, which allows Pandas operations to be parallelized, or switching to NumPy, which can be more efficient for certain tasks. Profiling your code with tools like cProfile or line_profiler can also help identify bottlenecks and optimize accordingly.

60. How do you merge datasets with different keys in Pandas?

Merging datasets with different keys in Pandas is done using the merge() function with the left_on and right_on parameters. These parameters allow you to specify different columns for joining the datasets. For example, 

df1.merge(df2, left_on='key1', right_on='key2') 


merges the two DataFrames based on key1 from df1 and key2 from df2. This method is useful when the datasets have related information but use different identifiers. You can also choose the type of join, such as inner, left, right, or outer, depending on how you want to handle non-matching keys. Properly merging datasets with different keys ensures that your combined dataset is accurate and complete.

61. What is the significance of the apply() function in Pandas?

The apply() function in Pandas is significant because it allows you to apply a custom function along an axis of the DataFrame (either rows or columns). This provides flexibility for performing operations that are not built into Pandas. For instance, 

df['new_column'] = df['existing_column'].apply(lambda x: x * 2) 


creates a new column by doubling the values in an existing column. You can use apply() to apply functions row-wise (axis=1) or column-wise (axis=0). While apply() is versatile, it is important to note that it can be slower than vectorized operations, so it should be used when necessary, and optimized if performance is a concern.

62. How would you detect and remove duplicate rows in a Pandas DataFrame?

Detecting and removing duplicate rows in a Pandas DataFrame is done using the duplicated() and drop_duplicates() methods. The duplicated() function identifies duplicate rows by returning a boolean Series, where True indicates a duplicate row. You can specify which columns to check for duplicates by using the subset parameter. For example, df.duplicated(subset=['column1', 'column2']) checks for duplicates based on column1 and column2. To remove duplicates, use the drop_duplicates() method, which removes duplicate rows by default or based on specified columns. Handling duplicates is essential for ensuring data integrity and accuracy in analysis.

63. How do you concatenate DataFrames vertically and horizontally in Pandas?

In Pandas, you can concatenate DataFrames vertically or horizontally using the concat() function. To concatenate DataFrames vertically, simply pass them as a list to concat(), like pd.concat([df1, df2]). This stacks the DataFrames on top of each other, assuming they have the same columns. To concatenate horizontally, use the axis parameter set to 1: pd.concat([df1, df2], axis=1). This aligns the DataFrames side by side, assuming they have the same index. Concatenating DataFrames is useful when you need to combine data from different sources or when you want to add new data to an existing dataset.

64. How would you check for and handle outliers in a Pandas DataFrame?

Checking for and handling outliers in a Pandas DataFrame can be done using statistical methods and visualization tools. One common method is to use the describe() function to get a summary of each column, where you can identify outliers by looking at values that are significantly higher or lower than the rest of the data. You can also use the interquartile range (IQR) to detect outliers: calculate the IQR and then filter out values that fall below 

Q1 - 1.5*IQR or above Q3 + 1.5*IQR.

Once detected, you can handle outliers by removing them, capping them to a maximum value, or transforming the data using methods like logarithmic scaling. Handling outliers is critical for ensuring that they don’t skew your analysis or model results.

Pandas MCQ Questions

65. Which function is used to read a CSV file into a DataFrame in Pandas?

  • a) read_file()
  • b) read_csv()
  • c) read_data()
  • d) read_df()
    Answer: b) read_csv()

66. What does the head() function do in Pandas?

  • a) Returns the first 5 rows of a DataFrame
  • b) Returns the last 5 rows of a DataFrame
  • c) Deletes the first row of a DataFrame
  • d) Sorts the DataFrame by a specific column
    Answer: a) Returns the first 5 rows of a DataFrame

67. How do you change the index of a DataFrame?

  • a) set_index()
  • b) change_index()
  • c) modify_index()
  • d) alter_index()
    Answer: a) set_index()

68. Which method is used to fill missing values in a DataFrame?

  • a) fillna()
  • b) dropna()
  • c) replace_na()
  • d) fill_missing()
    Answer: a) fillna()

69. What type of join is performed by default when using the merge() function?

  • a) Inner join
  • b) Left join
  • c) Right join
  • d) Outer join
    Answer: a) Inner join

70. How do you check the data type of each column in a DataFrame?

  • a) types()
  • b) dtype()
  • c) dtypes()
  • d) columns()
    Answer: c) dtypes()

71. Which of the following is used to remove rows with missing data?

  • a) drop()
  • b) dropna()
  • c) remove_na()
  • d) delete_na()
    Answer: b) dropna()

72. What does the describe() function in Pandas provide?

  • a) Statistical summary of the DataFrame
  • b) Plots a histogram
  • c) Returns data types of columns
  • d) Deletes duplicate rows
    Answer: a) Statistical summary of the DataFrame

73. How do you concatenate two DataFrames vertically?

  • a) append()
  • b) join()
  • c) merge()
  • d) concat()
    Answer: d) concat()

74. What does the isin() function do in Pandas?

  • a) Checks if a value is in a DataFrame
  • b) Removes specific values from a DataFrame
  • c) Sorts a DataFrame
  • d) Fills missing values
    Answer: a) Checks if a value is in a DataFrame

Conclusion

In this article, we have discussed pandas interview questions in detail. We started with a basic introduction to the pandas and then discussed Pandas Interview Questions thoroughly.

After reading about the pandas interview questions, are you not feeling excited to read/explore more articles on other interview-related articles? Don't worry; Coding Ninjas has you covered: Mainframe Interview Questions, Flutter Interview QuestionsReact Native Interview Questions, Operating System Interview Questions and JPA Interview Questions.

Other Interview Questions:

Refer to our Guided Path on Coding Ninjas Studio to upskill yourself in Data Structures and AlgorithmsCompetitive ProgrammingJavaScriptSystem Design, and many more! If you want to test your competency in coding, you may check out the mock test series and participate in the contests hosted on Coding Ninjas Studio! 

But suppose you have just started your learning process and are looking for questions asked by tech giants like Amazon, Microsoft, Uber, etc. In that case, you must look at the problemsinterview experiences, and interview bundle for placement preparations.

Nevertheless, you may consider our paid courses to give your career an edge over others!

Do upvote our blogs if you find them helpful and engaging!

Happy Learning!

Live masterclass