UHG interview experience Real time questions & tips from candidates to crack your interview

Data Scientist

UHG
upvote
share-icon
1 rounds | 4 Coding problems

Interview preparation journey

expand-icon
Preparation
Duration: 6 months
Topics: Data Science, Data Analysis, Machine Learning, Python, Pandas, Numpy
Tip
Tip

Tip 1 : My advice would be to search on Google regarding "100 data science interview questions" or something like it, and as you go through each question and its answer, just search the Wikipedia or Google for each concept for a little more depth.
Tip 2 : As for Python coding questions, most questions in such interviews are asked only about the Pandas library of Python. So try to study how to use Pandas to manipulate data..
Tip 3 : A good reference is the Python For Data Analysis book by Wes McKinney, whose free PDF is most probably available online on Google.

Application process
Where: Other
Eligibility: Above 7 CGPA
Resume Tip
Resume tip

Tip 1 : Have at-least 2 good projects explained in short with all important points covered.
Tip 2 : Every skill must be mentioned.
Tip 3 : Focus on skills, projects and experiences more.

Interview rounds

01
Round
Easy
Video Call
Duration60 minutes
Interview date5 Oct 2020
Coding problem4

Technical round with questions on ML and Data science.

1. ML Question

What are the hyperparameters of Random Forest Classifier?

Problem approach

Inbuilt hyperparameters:
1. n_estimators: We know that a random forest is nothing but a group of many decision trees, the n_estimator parameter controls the number of trees inside the classifier However, it will not cause any overfitting but can certainly increase the time complexity of the model. The default number of estimators is 100 in scikit-learn.

2. max_depth: It governs the maximum height upto which the trees inside the forest can grow. It is one of the most important hyperparameters when it comes to increasing the accuracy of the model, as we increase the depth of the tree the model accuracy increases upto a certain limit but then it will start to decrease gradually because of overfitting in the model. The default value is set to None, None specifies that the nodes inside the tree will continue to grow until all leaves become pure or all leaves contain less than min_samples_split (another hyperparameter).

3. min_samples_split: It specifies the minimum amount of samples an internal node must hold in order to split into further nodes. If we have a very low value of min_samples_splits then, in this case, our tree will continue to grow and start overfitting. By increasing the value of min_samples_splits we can decrease the total number of splits thus limiting the number of parameters in the model and thus can aid in reducing the overfitting in the model. We generally keep min_samples_split value between 2 and 6. However, the default value is set to 2.

4. min_samples_leaf: It specifies the minimum amount of samples that a node must hold after getting split. It also helps to reduce overfitting when we have ample amount of parameters. Less number of parameters can lead to overfitting also, we should keep in mind that increasing the value to a large number can lead to less number of parameters and in this case model can underfit also. The default value is set to 1.

5. max_features: Random forest takes random subsets of features and tries to find the best split. max_features helps to find the number of features to take into account in order to make the best split. It can take four values “auto“, “sqrt“, “log2” and None.

6. max_leaf_nodes: It sets a limit on the splitting of the node and thus helps to reduce the depth of the tree, and effectively helps in reducing overfitting. If the value is set to None, the tree continues to grow infinitely.
max_samples: This hyperparameter helps to choose maximum number of samples from the training dataset to train each individual tree.

2. ML Question

What is a ROC Curve?

Problem approach

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:
True Positive Rate
False Positive Rate
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.

3. ML Question

What is XGBoost?

Problem approach

XGBoost is arguably the most powerful algorithm and is increasingly being used in all industries and in all problem domains —from customer analytics and sales prediction to fraud detection and credit approval and more.
The key strengths of XGBoost are:
Flexibility: It can perform machine learning tasks such as regression, classification, ranking and other user-defined objectives.
Portability: It runs on Windows, Linux and OS X as well as on cloud platforms.
Languages support: It supports multiple languages including C++, Python, R, Java, Scala, Julia.
Distributed training on cloud systems: XGBoost supports distributed training on multiple machines, including AWS, GCE, Azure, and Yarn clusters.

4. ML Question

List some hyperparameters of XGBoost.

Problem approach

Frequently tuned hyperparameters
n_estimators: specifies the number of decision trees to be boosted. If n_estimator = 1, it means only 1 tree is generated, thus no boosting is at work. The default value is 100, but you can play with this number for optimal performance.

subsample: it represents the subsample ratio of the training sample. A subsample = 0.5 means that 50% of training data is used prior to growing a tree. The value can be any fraction but the default value is 1.

max_depth: it limits how deep each tree can grow. The default value is 6 but you can try other values if overfitting is an issue in your model.

learning_rate (alias: eta): it is a regularization parameter that shrinks feature weights in each boosting step. The default value is 0.3 but people generally tune with values such as 0.01, 0.1, 0.2 etc.

gamma (alias: min_split_loss): it’s another regularization parameter for tree pruning. It specifies the minimum loss reduction required to grow a tree. The default value is set at 0.

Here's your problem of the day

Solving this problem will increase your chance to get selected in this company

Skill covered: Programming

What is recursion?

Choose another skill to practice
Similar interview experiences
Software Engineer
2 rounds | 6 problems
Interviewed by UHG
2824 views
0 comments
0 upvotes
Software Engineer
3 rounds | 2 problems
Interviewed by UHG
1717 views
0 comments
0 upvotes
Senior Systems Engineer
4 rounds | 7 problems
Interviewed by UHG
1164 views
0 comments
0 upvotes
Software Engineer
4 rounds | 6 problems
Interviewed by UHG
2186 views
0 comments
0 upvotes
Companies with similar interview experiences
company logo
Data Scientist
4 rounds | 8 problems
Interviewed by Walmart
2314 views
0 comments
0 upvotes
company logo
Data Scientist
2 rounds | 3 problems
Interviewed by Sprinklr
0 views
0 comments
0 upvotes
company logo
Data Scientist
2 rounds | 11 problems
Interviewed by Ernst & Young (EY)
2517 views
0 comments
0 upvotes