Tip 1 : Must do Previously asked Interview as well as Online Test Questions.
Tip 2 : Go through all the previous interview experiences from Codestudio and Leetcode.
Tip 1 : Have at-least 2 good projects explained in short with all important points covered.
Tip 2 : Every skill must be mentioned.
Tip 3 : Focus on skills, projects and experiences more.
This round mainly revolved around concepts from Data Warehouse and Big Data.
What is a Slowly Changing Dimension (SCD)? What are the different types of SCDs?
A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over
time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the
history of dimension records.
There are three types of SCDs and you can use Warehouse Builder to define, deploy, and load all three types of
SCDs.
Type 1 SCDs - Overwriting :
In a Type 1 SCD the new data overwrites the existing data. Thus the existing data is lost as it is not stored anywhere
else. This is the default type of dimension you create. You do not need to specify any additional information to create
a Type 1 SCD.
Type 2 SCDs - Creating another dimension record :
A Type 2 SCD retains the full history of values. When the value of a chosen attribute changes, the current record is
closed. A new record is created with the changed data values and this new record becomes the current record. Each
record contains the effective time and expiration time to identify the time period between which the record was active.
Type 3 SCDs - Creating a current value field :
A Type 3 SCD stores two versions of values for certain selected level attributes. Each record stores the previous
value and the current value of the selected attribute. When the value of any of the selected attributes changes, the
current value is stored as the old value and the new value becomes the current value.
Explain ETL Process in Data Warehouse
ETL is a process in Data Warehousing and it stands for Extract, Transform and Load. It is a process in which an ETL
tool extracts the data from various data source systems, transforms it in the staging area, and then finally, loads it into
the Data Warehouse system.
The different steps in ETL process are :
1) Extraction:
The first step of the ETL process is extraction. In this step, data from various source systems is extracted which can
be in various formats like relational databases, No SQL, XML, and flat files into the staging area. It is important to
extract the data from various source systems and store it into the staging area first and not directly into the data
warehouse because the extracted data is in various formats and can be corrupted also. Hence loading it directly into
the data warehouse may damage it and rollback will be much more difficult. Therefore, this is one of the most
important steps of ETL process.
2) Transformation:
The second step of the ETL process is transformation. In this step, a set of rules or functions are applied on the
extracted data to convert it into a single standard format. It may involve following processes/tasks:
i) Filtering – loading only certain attributes into the data warehouse.
ii) Cleaning – filling up the NULL values with some default values, mapping U.S.A, United States, and America into
USA, etc.
iii) Joining – joining multiple attributes into one.
iv) Splitting – splitting a single attribute into multiple attributes.
v) Sorting – sorting tuples on the basis of some attribute (generally key-attribute).
3) Loading:
The third and final step of the ETL process is loading. In this step, the transformed data is finally loaded into the data
warehouse. Sometimes the data is updated by loading into the data warehouse very frequently and sometimes it is
done after longer but regular intervals. The rate and period of loading solely depends on the requirements and varies
from system to system.
What are the different types of data marts in the context of data warehousing?
Following are the different types of data mart in data warehousing :
1) Dependent Data Mart : A dependent data mart can be developed using data from operational, external, or both
sources. It enables the data of the source company to be accessed from a single data warehouse. All data is
centralized, which can aid in the development of further data marts.
2) Independent Data Mart : There is no need for a central data warehouse with this data mart. This is typically
established for smaller groups that exist within a company. It has no connection to Enterprise Data Warehouse or any
other data warehouse. Each piece of information is self-contained and can be used independently. The analysis can
also be carried out independently. It's critical to maintain a consistent and centralized data repository that numerous
users can access.
3) Hybrid Data Mart : A hybrid data mart is utilized when a data warehouse contains inputs from multiple sources, as
the name implies. When a user requires an ad hoc integration, this feature comes in handy. This solution can be
utilized if an organization requires various database environments and quick implementation. It necessitates the least
amount of data purification, and the data mart may accommodate huge storage structures. When smaller data-centric
applications are employed, a data mart is most effective.
Explain sessions. Explain how batches are used to combine executions?
A teaching set that needs to be implemented to convert data from a source to a target is called a session. Session can be carried out using the session’s manager or pmcmd command. Batch execution can be used to combine sessions executions either in serial manner or in a parallel. Batches can have different sessions carrying forward in a parallel or serial manner.
Difference between Fact Table and Dimension Table
1) Fact table contains the measuring on the attributes of a dimension table.
Dimension table contains the attributes on that truth table calculates the metric.
2) In fact table, There is less attributes than dimension table.
While in dimension table, There is more attributes than fact table.
3) In fact table, There is more records than dimension table.
While in dimension table, There is less records than fact table.
4) Fact table forms a vertical table.
While dimension table forms a horizontal table.
5) The attribute format of fact table is in numerical format and text format.
While the attribute format of dimension table is in text format.
6) The number of fact table is less than dimension table in a schema.
While the number of dimension is more than fact table in a schema.
Explain overfitting in big data? How to avoid the same.
Overfitting is generally a modeling error referring to a model that is tightly fitted to the data, i.e. When a modeling
function is closely fitted to a limited data set. Due to Overfitting, the predictivity of such models gets reduced. This
effect leads to a decrease in generalization ability failing to generalize when applied outside the sample data.
There are several Methods to avoid Overfitting; some of them are :
1) Cross-validation : A cross-validation method refers to dividing the data into multiple small test data sets, which can
be used to tune the model.
2) Early stopping : After a certain number of iterations, the generalizing capacity of the model weakens; in order to
avoid that, a method called early stopping is used in order to avoid Overfitting before the model crosses that point.
3) Regularization : this method is used to penalize all the parameters except intercept so that the model generalizes
the data instead of Overfitting.
Explain Outliers.
Outliers are the data points that are very far from the group, which is not a part of any group or cluster. This may affect the behavior of the model, they may predict wrong results, or their accuracy will be very low. Therefore Outliers must be handled carefully as they may also contain some helpful information. The presence of these outliers may lead to misleading a Big Data model or a Machine Learning Model. The results of this may be,
1) Poor Results
2) Lower accuracy
3) Longer Training Time
How to deploy a Big Data Model? Mention the key steps involved
Deploying a model into a Big Data Platform involves mainly three key steps they are,
1) Data Ingestion: This process involves collecting data from different sources like social media platforms, business
applications, log files, etc.
2) Data Storage: When data extraction is completed, the challenge is to store this large volume of data in the
database in which the Hadoop Distributed File system (HDFS) plays a vital role.
3) Data Processing: After storing the data in HDFS or HBase, the next task is to analyze and visualize these large
amounts of data using specific algorithms for better data processing. And yet again, this task is more straightforward
if we use Hadoop, Apache Spark, Pig, etc.
After performing these essential steps, one can deploy a big data model successfully.
In this round, the interviewer asked me questions mainly from Tableau and Power BI.
What are the different types of joins in Tableau?
Tableau is pretty similar to SQL. Therefore, the types of joins in Tableau are similar:
1) Left Outer Join: Extracts all the records from the left table and the matching rows from the right table.
2) Right Outer Join: Extracts all the records from the right table and the matching rows from the left table.
3) Full Outer Join: Extracts the records from both the left and right tables. All unmatched rows go with the NULL value.
4) Inner Join: Extracts the records from both tables.
How do you generally perform load testing in Tableau?
Load testing in Tableau is done to understand the server’s capacity with respect to its environment, data, workload, and use. It is preferable to conduct load testing at least 3-4 times in a year because with every new user, upgrade, or content authoring, the usage, data, and workload change.
Tabjolt was created by Tableau to conduct point-and-run load and performance testing specifically for Tableau servers. Tabjolt:
1) Automates the process of user-specified loads
2) Eliminates dependency on script development or script maintenance
3) Scales linearly with an increase in the load by adding more nodes to the cluster
What are the various types of filters in Tableau?
Tableau has 6 different types of filters:
1) Extract Filter: This filter retrieves a subset of data from the data source.
2) Dimension Filter: This filter is for non-aggregated data (discrete).
3) Data Source Filter: This filter refrains users from viewing sensitive information and thus reduces data feeds.
4) Context Filter: This filter creates datasets by applying presets in Tableau.
5) Measure Filter: This filter applies various operations like sum, median, avg, etc.
6) Table Calculation Filter: This filter is applied after the view has been created.
Explain when would you use Joins vs. Blending in Tableau?
While the two terms may sound similar, there is a difference in their meaning and use in Tableau:
While Join is used to combine two or more tables within the same data source.
Blending is used to combine data from multiple data sources such as Oracle, Excel, SQL server, etc.
What are the building blocks of Power BI?
The major building blocks of Power BI are:
1) Datasets: Dataset is a collection of data gathered from various sources like SQL Server, Azure, Text, Oracle, XML, JSON, and many more. With the GetData feature in Power BI, we can easily fetch data from any data source.
2) Visualizations: Visualization is the visual aesthetic representation of data in the form of maps, charts, or tables.
3) Reports: Reports are a structured representation of datasets that consists of multiple pages. Reports help to extract important information and insights from datasets to take major business decisions.
4) Dashboards: A dashboard is a single-page representation of reports made of various datasets. Each element is termed a tile.
5) Tiles: Tiles are single-block containing visualizations of a report. Tiles help to differentiate each report.
What is the comprehensive working system of Power BI?
Power BI’s working system mainly comprises three steps:
1) Data Integration: The first step is to extract and integrate the data from heterogeneous data sources. After integration, the data is converted into a standard format and stored in a common area called the staging area.
2) Data Processing: Once the data is assembled and integrated, it requires some cleaning up. Raw data is not so useful therefore, a few transformation and cleaning operations are performed on the data to remove redundant values, etc. After the data is transformed, it is stored in data warehouses.
3) Data Presentation: Now that the data is transformed and cleaned, it is visually presented on the Power BI desktop as reports, dashboards, or scorecards. These reports can be shared via mobile apps or web to various business users.
Difference b/w Power BI and Tableau.
Major points of difference b/w Power BI and Tableau are as follows :
1) Tableau deploys MDX for measures and dimensions while Power BI uses DAX for calculating and measuring columns.
2) Tableau platform is known for its data visualization functionality whereas Power BI offers numerous data points to offer data visualization
3) Tableau BI can handle a huge volume of data with better performance while Power BI can handle a limited volume of data.
4) Tableau works best when there is a vast data in the cloud but Power Bl doesn’t work better with a massive amount of data
5) Tableau is used by analysts and experienced- users mostly use for their analytics purposes and Power BI is used by both naive and experienced users.
Difference b/w OLAP and OLTP
OLAP stands for Online analytical processing and OLTP stands for Online transaction processing. The major
differences b/w OLTP and OLAP are :
1) OLAP : Consists of historical data from various Databases.
OLTP : Consists only operational current data.
2) OLAP : It is subject oriented. Used for Data Mining, Analytics, Decision making,etc.
OLTP : It is application oriented. Used for business tasks.
3) OLAP : It reveals a snapshot of present business tasks.
OLTP : It provides a multi-dimensional view of different business tasks.
4) OLAP : Relatively slow as the amount of data involved is large. Queries may take hours.
OLTP : Very Fast as the queries operate on 5% of the data.
5) OLAP : It only need backup from time to time as compared to OLTP.
OLTP : Backup and recovery process is maintained religiously
6) OLAP : Large amount of data is stored typically in TB, PB
OLTP : The size of the data is relatively small as the historical data is archived. For ex MB, GB
This was a typical HR round with some standard Behavioral questions.
Tell me something about yourself?
Tip 1 : Prepare the points that you will speak in your introduction prior to the interview.
Tip 2 : Tell about your current cgpa, achievements and authenticated certification
Tip 3 : I told about my role in current internship and what all I do

Here's your problem of the day
Solving this problem will increase your chance to get selected in this company
What is recursion?