Tip 1 : My advice would be to search on Google regarding "100 data science interview questions" or something like it, and as you go through each question and its answer, just search the Wikipedia or Google for each concept for a little more depth.
Tip 2 : As for Python coding questions, most questions in such interviews are asked only about the Pandas library of Python. So try to study how to use Pandas to manipulate data..
Tip 3 : A good reference is the Python For Data Analysis book by Wes McKinney, whose free PDF is most probably available online on Google.
Tip 1 : Have at-least 2 good projects explained in short with all important points covered.
Tip 2 : Every skill must be mentioned.
Tip 3 : Focus on skills, projects and experiences more.
Technical round with questions on ML and Data science.
What are the hyperparameters of Random Forest Classifier?
Inbuilt hyperparameters:
1. n_estimators: We know that a random forest is nothing but a group of many decision trees, the n_estimator parameter controls the number of trees inside the classifier However, it will not cause any overfitting but can certainly increase the time complexity of the model. The default number of estimators is 100 in scikit-learn.
2. max_depth: It governs the maximum height upto which the trees inside the forest can grow. It is one of the most important hyperparameters when it comes to increasing the accuracy of the model, as we increase the depth of the tree the model accuracy increases upto a certain limit but then it will start to decrease gradually because of overfitting in the model. The default value is set to None, None specifies that the nodes inside the tree will continue to grow until all leaves become pure or all leaves contain less than min_samples_split (another hyperparameter).
3. min_samples_split: It specifies the minimum amount of samples an internal node must hold in order to split into further nodes. If we have a very low value of min_samples_splits then, in this case, our tree will continue to grow and start overfitting. By increasing the value of min_samples_splits we can decrease the total number of splits thus limiting the number of parameters in the model and thus can aid in reducing the overfitting in the model. We generally keep min_samples_split value between 2 and 6. However, the default value is set to 2.
4. min_samples_leaf: It specifies the minimum amount of samples that a node must hold after getting split. It also helps to reduce overfitting when we have ample amount of parameters. Less number of parameters can lead to overfitting also, we should keep in mind that increasing the value to a large number can lead to less number of parameters and in this case model can underfit also. The default value is set to 1.
5. max_features: Random forest takes random subsets of features and tries to find the best split. max_features helps to find the number of features to take into account in order to make the best split. It can take four values “auto“, “sqrt“, “log2” and None.
6. max_leaf_nodes: It sets a limit on the splitting of the node and thus helps to reduce the depth of the tree, and effectively helps in reducing overfitting. If the value is set to None, the tree continues to grow infinitely.
max_samples: This hyperparameter helps to choose maximum number of samples from the training dataset to train each individual tree.
What is a ROC Curve?
An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:
True Positive Rate
False Positive Rate
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.
What is XGBoost?
XGBoost is arguably the most powerful algorithm and is increasingly being used in all industries and in all problem domains —from customer analytics and sales prediction to fraud detection and credit approval and more.
The key strengths of XGBoost are:
Flexibility: It can perform machine learning tasks such as regression, classification, ranking and other user-defined objectives.
Portability: It runs on Windows, Linux and OS X as well as on cloud platforms.
Languages support: It supports multiple languages including C++, Python, R, Java, Scala, Julia.
Distributed training on cloud systems: XGBoost supports distributed training on multiple machines, including AWS, GCE, Azure, and Yarn clusters.
List some hyperparameters of XGBoost.
Frequently tuned hyperparameters
n_estimators: specifies the number of decision trees to be boosted. If n_estimator = 1, it means only 1 tree is generated, thus no boosting is at work. The default value is 100, but you can play with this number for optimal performance.
subsample: it represents the subsample ratio of the training sample. A subsample = 0.5 means that 50% of training data is used prior to growing a tree. The value can be any fraction but the default value is 1.
max_depth: it limits how deep each tree can grow. The default value is 6 but you can try other values if overfitting is an issue in your model.
learning_rate (alias: eta): it is a regularization parameter that shrinks feature weights in each boosting step. The default value is 0.3 but people generally tune with values such as 0.01, 0.1, 0.2 etc.
gamma (alias: min_split_loss): it’s another regularization parameter for tree pruning. It specifies the minimum loss reduction required to grow a tree. The default value is set at 0.

Here's your problem of the day
Solving this problem will increase your chance to get selected in this company
What is recursion?