UHG interview experience Real time questions & tips from candidates to crack your interview

Data Scientist

UHG

1 rounds | 4 Coding problems

Anonymous

Experienced | Oct 2020

Rejected

Interview preparation journey

Preparation

Duration: 6 months

Topics: Data Science, Data Analysis, Machine Learning, Python, Pandas, Numpy

Tip

Tip 1 : My advice would be to search on Google regarding "100 data science interview questions" or something like it, and as you go through each question and its answer, just search the Wikipedia or Google for each concept for a little more depth.
Tip 2 : As for Python coding questions, most questions in such interviews are asked only about the Pandas library of Python. So try to study how to use Pandas to manipulate data..
Tip 3 : A good reference is the Python For Data Analysis book by Wes McKinney, whose free PDF is most probably available online on Google.

Application process

Where: Other

Eligibility: Above 7 CGPA

Resume tip

Tip 1 : Have at-least 2 good projects explained in short with all important points covered.
Tip 2 : Every skill must be mentioned.
Tip 3 : Focus on skills, projects and experiences more.

Interview rounds

Round

Easy

Video Call

Duration60 minutes

Interview date5 Oct 2020

Coding problem4

Technical round with questions on ML and Data science.

1. ML Question

What are the hyperparameters of Random Forest Classifier?

Problem approach

Inbuilt hyperparameters:
1. n_estimators: We know that a random forest is nothing but a group of many decision trees, the n_estimator parameter controls the number of trees inside the classifier However, it will not cause any overfitting but can certainly increase the time complexity of the model. The default number of estimators is 100 in scikit-learn.

2. max_depth: It governs the maximum height upto which the trees inside the forest can grow. It is one of the most important hyperparameters when it comes to increasing the accuracy of the model, as we increase the depth of the tree the model accuracy increases upto a certain limit but then it will start to decrease gradually because of overfitting in the model. The default value is set to None, None specifies that the nodes inside the tree will continue to grow until all leaves become pure or all leaves contain less than min_samples_split (another hyperparameter).

3. min_samples_split: It specifies the minimum amount of samples an internal node must hold in order to split into further nodes. If we have a very low value of min_samples_splits then, in this case, our tree will continue to grow and start overfitting. By increasing the value of min_samples_splits we can decrease the total number of splits thus limiting the number of parameters in the model and thus can aid in reducing the overfitting in the model. We generally keep min_samples_split value between 2 and 6. However, the default value is set to 2.

4. min_samples_leaf: It specifies the minimum amount of samples that a node must hold after getting split. It also helps to reduce overfitting when we have ample amount of parameters. Less number of parameters can lead to overfitting also, we should keep in mind that increasing the value to a large number can lead to less number of parameters and in this case model can underfit also. The default value is set to 1.

5. max_features: Random forest takes random subsets of features and tries to find the best split. max_features helps to find the number of features to take into account in order to make the best split. It can take four values “auto“, “sqrt“, “log2” and None.

6. max_leaf_nodes: It sets a limit on the splitting of the node and thus helps to reduce the depth of the tree, and effectively helps in reducing overfitting. If the value is set to None, the tree continues to grow infinitely.
max_samples: This hyperparameter helps to choose maximum number of samples from the training dataset to train each individual tree.

2. ML Question

What is a ROC Curve?

Problem approach

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:
True Positive Rate
False Positive Rate
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.

3. ML Question

What is XGBoost?

Problem approach

XGBoost is arguably the most powerful algorithm and is increasingly being used in all industries and in all problem domains —from customer analytics and sales prediction to fraud detection and credit approval and more.
The key strengths of XGBoost are:
Flexibility: It can perform machine learning tasks such as regression, classification, ranking and other user-defined objectives.
Portability: It runs on Windows, Linux and OS X as well as on cloud platforms.
Languages support: It supports multiple languages including C++, Python, R, Java, Scala, Julia.
Distributed training on cloud systems: XGBoost supports distributed training on multiple machines, including AWS, GCE, Azure, and Yarn clusters.

4. ML Question

List some hyperparameters of XGBoost.

Problem approach

Frequently tuned hyperparameters
n_estimators: specifies the number of decision trees to be boosted. If n_estimator = 1, it means only 1 tree is generated, thus no boosting is at work. The default value is 100, but you can play with this number for optimal performance.

subsample: it represents the subsample ratio of the training sample. A subsample = 0.5 means that 50% of training data is used prior to growing a tree. The value can be any fraction but the default value is 1.

max_depth: it limits how deep each tree can grow. The default value is 6 but you can try other values if overfitting is an issue in your model.

learning_rate (alias: eta): it is a regularization parameter that shrinks feature weights in each boosting step. The default value is 0.3 but people generally tune with values such as 0.01, 0.1, 0.2 etc.

gamma (alias: min_split_loss): it’s another regularization parameter for tree pruning. It specifies the minimum loss reduction required to grow a tree. The default value is set at 0.

Skill covered: Programming

How do you remove whitespace from the start of a string?

strip()

rstrip()

lstrip()

remove()

Choose another skill to practice

Similar interview experiences

Software Engineer

2 rounds | 6 problems

Interviewed by UHG

Sep 2020

Experienced

Selected

2879 views

0 comments

0 upvotes

Software Engineer

3 rounds | 2 problems

Interviewed by UHG

Oct 2021

Experienced

Selected

1760 views

0 comments

0 upvotes

Senior Systems Engineer

4 rounds | 7 problems

Interviewed by UHG

May 2018

Experienced

Selected

1204 views

0 comments

0 upvotes

Software Engineer

4 rounds | 6 problems

Interviewed by UHG

Jul 2022

Experienced

Selected

2227 views

0 comments

0 upvotes

Companies with similar interview experiences

Data Scientist

4 rounds | 8 problems

Interviewed by Walmart

Apr 2021

Experienced

Selected

2377 views

0 comments

0 upvotes

Data Scientist

2 rounds | 3 problems

Interviewed by Sprinklr

Dec 2021

Fresher

Selected

0 views

0 comments

0 upvotes

Data Scientist

2 rounds | 11 problems

Interviewed by Ernst & Young (EY)

Jul 2021

Experienced

Selected

2665 views

0 comments

0 upvotes

Data Scientist

Interview preparation journey

Are you ready for your Dream Job?

Interview rounds

1. ML Question

2. ML Question

3. ML Question

4. ML Question