Software Developer Zensar Technologies interview experience

Interview rounds

Round

Medium

Video Call

Duration60 Minutes

Interview date29 Nov 2021

Coding problem8

This round mainly revolved around concepts from Data Warehouse and Big Data.

1. Data Warehouse Question

What is a Slowly Changing Dimension (SCD)? What are the different types of SCDs?

Problem approach

A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over
time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the
history of dimension records.

There are three types of SCDs and you can use Warehouse Builder to define, deploy, and load all three types of
SCDs.

Type 1 SCDs - Overwriting :
In a Type 1 SCD the new data overwrites the existing data. Thus the existing data is lost as it is not stored anywhere
else. This is the default type of dimension you create. You do not need to specify any additional information to create
a Type 1 SCD.

Type 2 SCDs - Creating another dimension record :
A Type 2 SCD retains the full history of values. When the value of a chosen attribute changes, the current record is
closed. A new record is created with the changed data values and this new record becomes the current record. Each
record contains the effective time and expiration time to identify the time period between which the record was active.

Type 3 SCDs - Creating a current value field :
A Type 3 SCD stores two versions of values for certain selected level attributes. Each record stores the previous
value and the current value of the selected attribute. When the value of any of the selected attributes changes, the
current value is stored as the old value and the new value becomes the current value.

2. Data Warehouse Question

Explain ETL Process in Data Warehouse

Problem approach

ETL is a process in Data Warehousing and it stands for Extract, Transform and Load. It is a process in which an ETL
tool extracts the data from various data source systems, transforms it in the staging area, and then finally, loads it into
the Data Warehouse system.

The different steps in ETL process are :

1) Extraction:
The first step of the ETL process is extraction. In this step, data from various source systems is extracted which can
be in various formats like relational databases, No SQL, XML, and flat files into the staging area. It is important to
extract the data from various source systems and store it into the staging area first and not directly into the data
warehouse because the extracted data is in various formats and can be corrupted also. Hence loading it directly into
the data warehouse may damage it and rollback will be much more difficult. Therefore, this is one of the most
important steps of ETL process.

2) Transformation:
The second step of the ETL process is transformation. In this step, a set of rules or functions are applied on the
extracted data to convert it into a single standard format. It may involve following processes/tasks:
i) Filtering – loading only certain attributes into the data warehouse.
ii) Cleaning – filling up the NULL values with some default values, mapping U.S.A, United States, and America into
USA, etc.
iii) Joining – joining multiple attributes into one.
iv) Splitting – splitting a single attribute into multiple attributes.
v) Sorting – sorting tuples on the basis of some attribute (generally key-attribute).

3) Loading:
The third and final step of the ETL process is loading. In this step, the transformed data is finally loaded into the data
warehouse. Sometimes the data is updated by loading into the data warehouse very frequently and sometimes it is
done after longer but regular intervals. The rate and period of loading solely depends on the requirements and varies
from system to system.

3. Data Warehouse Question

What are the different types of data marts in the context of data warehousing?

Problem approach

Following are the different types of data mart in data warehousing :

1) Dependent Data Mart : A dependent data mart can be developed using data from operational, external, or both
sources. It enables the data of the source company to be accessed from a single data warehouse. All data is
centralized, which can aid in the development of further data marts.

2) Independent Data Mart : There is no need for a central data warehouse with this data mart. This is typically
established for smaller groups that exist within a company. It has no connection to Enterprise Data Warehouse or any
other data warehouse. Each piece of information is self-contained and can be used independently. The analysis can
also be carried out independently. It's critical to maintain a consistent and centralized data repository that numerous
users can access.

3) Hybrid Data Mart : A hybrid data mart is utilized when a data warehouse contains inputs from multiple sources, as
the name implies. When a user requires an ad hoc integration, this feature comes in handy. This solution can be
utilized if an organization requires various database environments and quick implementation. It necessitates the least
amount of data purification, and the data mart may accommodate huge storage structures. When smaller data-centric
applications are employed, a data mart is most effective.

4. Data Warehouse Question

Explain sessions. Explain how batches are used to combine executions?

Problem approach

A teaching set that needs to be implemented to convert data from a source to a target is called a session. Session can be carried out using the session’s manager or pmcmd command. Batch execution can be used to combine sessions executions either in serial manner or in a parallel. Batches can have different sessions carrying forward in a parallel or serial manner.

5. Data Warehouse Question

Difference between Fact Table and Dimension Table

Problem approach

1) Fact table contains the measuring on the attributes of a dimension table.
Dimension table contains the attributes on that truth table calculates the metric.

2) In fact table, There is less attributes than dimension table.
While in dimension table, There is more attributes than fact table.

3) In fact table, There is more records than dimension table.
While in dimension table, There is less records than fact table.

4) Fact table forms a vertical table.
While dimension table forms a horizontal table.

5) The attribute format of fact table is in numerical format and text format.
While the attribute format of dimension table is in text format.

6) The number of fact table is less than dimension table in a schema.
While the number of dimension is more than fact table in a schema.

6. Big Data Question

Explain overfitting in big data? How to avoid the same.

Problem approach

Overfitting is generally a modeling error referring to a model that is tightly fitted to the data, i.e. When a modeling
function is closely fitted to a limited data set. Due to Overfitting, the predictivity of such models gets reduced. This
effect leads to a decrease in generalization ability failing to generalize when applied outside the sample data.

There are several Methods to avoid Overfitting; some of them are :

1) Cross-validation : A cross-validation method refers to dividing the data into multiple small test data sets, which can
be used to tune the model.

2) Early stopping : After a certain number of iterations, the generalizing capacity of the model weakens; in order to
avoid that, a method called early stopping is used in order to avoid Overfitting before the model crosses that point.

3) Regularization : this method is used to penalize all the parameters except intercept so that the model generalizes
the data instead of Overfitting.

7. Big Data Question

Explain Outliers.

Problem approach

Outliers are the data points that are very far from the group, which is not a part of any group or cluster. This may affect the behavior of the model, they may predict wrong results, or their accuracy will be very low. Therefore Outliers must be handled carefully as they may also contain some helpful information. The presence of these outliers may lead to misleading a Big Data model or a Machine Learning Model. The results of this may be,

1) Poor Results
2) Lower accuracy
3) Longer Training Time

8. Big Data Question

How to deploy a Big Data Model? Mention the key steps involved

Problem approach

Deploying a model into a Big Data Platform involves mainly three key steps they are,

1) Data Ingestion: This process involves collecting data from different sources like social media platforms, business
applications, log files, etc.

2) Data Storage: When data extraction is completed, the challenge is to store this large volume of data in the
database in which the Hadoop Distributed File system (HDFS) plays a vital role.

3) Data Processing: After storing the data in HDFS or HBase, the next task is to analyze and visualize these large
amounts of data using specific algorithms for better data processing. And yet again, this task is more straightforward
if we use Hadoop, Apache Spark, Pig, etc.

After performing these essential steps, one can deploy a big data model successfully.

Round

Medium

Video Call

Duration60 Minutes

Interview date29 Nov 2021

Coding problem8

In this round, the interviewer asked me questions mainly from Tableau and Power BI.

1. Tableau Question

What are the different types of joins in Tableau?

Problem approach

Tableau is pretty similar to SQL. Therefore, the types of joins in Tableau are similar:

1) Left Outer Join: Extracts all the records from the left table and the matching rows from the right table.

2) Right Outer Join: Extracts all the records from the right table and the matching rows from the left table.

3) Full Outer Join: Extracts the records from both the left and right tables. All unmatched rows go with the NULL value.

4) Inner Join: Extracts the records from both tables.

2. Tableau Question

How do you generally perform load testing in Tableau?

Problem approach

Load testing in Tableau is done to understand the server’s capacity with respect to its environment, data, workload, and use. It is preferable to conduct load testing at least 3-4 times in a year because with every new user, upgrade, or content authoring, the usage, data, and workload change.

Tabjolt was created by Tableau to conduct point-and-run load and performance testing specifically for Tableau servers. Tabjolt:

1) Automates the process of user-specified loads
2) Eliminates dependency on script development or script maintenance
3) Scales linearly with an increase in the load by adding more nodes to the cluster

3. Tableau Question

What are the various types of filters in Tableau?

Problem approach

Tableau has 6 different types of filters:

1) Extract Filter: This filter retrieves a subset of data from the data source.

2) Dimension Filter: This filter is for non-aggregated data (discrete).

3) Data Source Filter: This filter refrains users from viewing sensitive information and thus reduces data feeds.

4) Context Filter: This filter creates datasets by applying presets in Tableau.

5) Measure Filter: This filter applies various operations like sum, median, avg, etc.

6) Table Calculation Filter: This filter is applied after the view has been created.

4. Tableau Question

Explain when would you use Joins vs. Blending in Tableau?

Problem approach

While the two terms may sound similar, there is a difference in their meaning and use in Tableau:

While Join is used to combine two or more tables within the same data source.

Blending is used to combine data from multiple data sources such as Oracle, Excel, SQL server, etc.

5. Power BI Question

What are the building blocks of Power BI?

Problem approach

The major building blocks of Power BI are:

1) Datasets: Dataset is a collection of data gathered from various sources like SQL Server, Azure, Text, Oracle, XML, JSON, and many more. With the GetData feature in Power BI, we can easily fetch data from any data source.

2) Visualizations: Visualization is the visual aesthetic representation of data in the form of maps, charts, or tables.

3) Reports: Reports are a structured representation of datasets that consists of multiple pages. Reports help to extract important information and insights from datasets to take major business decisions.

4) Dashboards: A dashboard is a single-page representation of reports made of various datasets. Each element is termed a tile.

5) Tiles: Tiles are single-block containing visualizations of a report. Tiles help to differentiate each report.

6. Power BI Question

What is the comprehensive working system of Power BI?

Problem approach

Power BI’s working system mainly comprises three steps:

1) Data Integration: The first step is to extract and integrate the data from heterogeneous data sources. After integration, the data is converted into a standard format and stored in a common area called the staging area.

2) Data Processing: Once the data is assembled and integrated, it requires some cleaning up. Raw data is not so useful therefore, a few transformation and cleaning operations are performed on the data to remove redundant values, etc. After the data is transformed, it is stored in data warehouses.

3) Data Presentation: Now that the data is transformed and cleaned, it is visually presented on the Power BI desktop as reports, dashboards, or scorecards. These reports can be shared via mobile apps or web to various business users.

7. Power BI Question

Difference b/w Power BI and Tableau.

Problem approach

Major points of difference b/w Power BI and Tableau are as follows :

1) Tableau deploys MDX for measures and dimensions while Power BI uses DAX for calculating and measuring columns.

2) Tableau platform is known for its data visualization functionality whereas Power BI offers numerous data points to offer data visualization

3) Tableau BI can handle a huge volume of data with better performance while Power BI can handle a limited volume of data.

4) Tableau works best when there is a vast data in the cloud but Power Bl doesn’t work better with a massive amount of data

5) Tableau is used by analysts and experienced- users mostly use for their analytics purposes and Power BI is used by both naive and experienced users.

8. DBMS Question

Difference b/w OLAP and OLTP

Problem approach

OLAP stands for Online analytical processing and OLTP stands for Online transaction processing. The major
differences b/w OLTP and OLAP are :

1) OLAP : Consists of historical data from various Databases.
OLTP : Consists only operational current data.

2) OLAP : It is subject oriented. Used for Data Mining, Analytics, Decision making,etc.
OLTP : It is application oriented. Used for business tasks.

3) OLAP : It reveals a snapshot of present business tasks.
OLTP : It provides a multi-dimensional view of different business tasks.

4) OLAP : Relatively slow as the amount of data involved is large. Queries may take hours.
OLTP : Very Fast as the queries operate on 5% of the data.

5) OLAP : It only need backup from time to time as compared to OLTP.
OLTP : Backup and recovery process is maintained religiously

6) OLAP : Large amount of data is stored typically in TB, PB
OLTP : The size of the data is relatively small as the historical data is archived. For ex MB, GB

Round

Easy

HR Round

Duration30 Minutes

Interview date29 Nov 2021

Coding problem1

This was a typical HR round with some standard Behavioral questions.

1. Basic HR Question

Tell me something about yourself?

Problem approach

Tip 1 : Prepare the points that you will speak in your introduction prior to the interview.
Tip 2 : Tell about your current cgpa, achievements and authenticated certification
Tip 3 : I told about my role in current internship and what all I do