Tip 1 : Must do Previously asked Interview as well as Online Test Questions.
Tip 2 : Go through all the previous interview experiences from Codestudio and Leetcode.
Tip 3 : Do at-least 2 good projects and you must know every bit of them.
Tip 1 : Have at-least 2 good projects explained in short with all important points covered.
Tip 2 : Every skill must be mentioned.
Tip 3 : Focus on skills, projects and experiences more.
Technical round with questions on Python, basic coding questions and Machine Learning.
‘S’ = racecar
The reverse of ‘S’ is: racecar
Since ‘S’ is equal to its reverse. So ‘S’ is a palindrome.
Hence output will be 1.
One approach could be to first reverse digits of n, then compare the reverse of n with n. If both are same, then return true, else false.
Pseudo code :
reverseDigits(num)
{
Initialise a variable rev_num to 0
while (num is greater than 0) {
rev_num = rev_num * 10 + num % 10
num = num / 10
}
return rev_num
}
/* Function to check if n is Palindrome*/
isPalindrome(n)
{
// get the reverse of n
rev_n = reverseDigits(n)
// Check if rev_n and n are same or not.
if (rev_n == n)
return 1
else
return 0
}
F(n) = F(n - 1) + F(n - 2),
Where, F(1) = 1, F(2) = 1
"Indexing is start from 1"
Input: 6
Output: 8
Explanation: The number is ‘6’ so we have to find the “6th” Fibonacci number.
So by using the given formula of the Fibonacci series, we get the series:
[ 1, 1, 2, 3, 5, 8, 13, 21]
So the “6th” element is “8” hence we get the output.
The recursive approach involves direct implementation of mathematical recurrence formula.
F(n) = F(n-1)+F(n-2)
Pseudocode :
fibonacci(n):
if(n<=1)
return n;
return fibonacci(n-1) + fibonacci(n-2)
This is an exponential approach.
It can be optimized using dynamic programming. Maintain an array that stores all the calculated fibonacci numbers so far and return the nth fibonacci number at last. This approach will take O(n) time complexity and O(n) auxiliary space.
Get month and Year from Date Column in Pandas
Use datetime.month attribute to find the month and use datetime.year attribute to find the year present in the Date .
df['year'] = df['Date Attribute'].dt.year
df['month'] = df['Date Attribute'].dt.month
Here ‘df’ is the object of the dataframe of pandas, pandas is callable as ‘pd’ (as imported), datetime is callable as ‘dt’ (as imported). ‘Date Attribute’ is the date column in your data-set (It can be anything ans varies from one data-set to other), ‘year’ and ‘month’ are the attributes for referring to the year and month respectively.
Relationship between R-squared and p-value in linear regression.
There is no established association/relationship between p-value and R-square. This all depends on the data (i.e.; contextual).
R-square value tells you how much variation is explained by your model. So 0.1 R-square means that your model explains 10% of variation within the data. The greater R-square the better the model. Whereas p-value tells you about the F statistic hypothesis testing of the “fit of the intercept-only model and your model are equal”. So if the p-value is less than the significance level (usually 0.05) then your model fits the data well.
What is underfitting and overfitting?
1) Overfitting refers to the scenario where a machine learning model can’t generalize or fit well on unseen dataset. A clear sign of machine learning overfitting is if its error on the testing or validation dataset is much greater than the error on training dataset.
2) Overfitting is a term used in statistics that refers to a modeling error that occurs when a function corresponds too closely to a dataset. As a result, overfitting may fail to fit additional data, and this may affect the accuracy of predicting future observations.
3) Overfitting happens when a model learns the detail and noise in the training dataset to the extent that it negatively impacts the performance of the model on a new dataset. This means that the noise or random fluctuations in the training dataset is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new datasets and negatively impact the model’s ability to generalize.
4) The opposite of overfitting is underfitting. Underfitting refers to a model that can neither model the training dataset nor generalize to new dataset. An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training dataset.
What is a confusion matrix?
It is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values. It is extremely useful for measuring Recall, Precision, Specificity, Accuracy, and most importantly AUC-ROC curves.
TP, FP, FN, TN in terms of pregnancy analogy :
True Positive:
Interpretation: You predicted positive and it’s true.
True Negative:
Interpretation: You predicted negative and it’s true.
False Positive: (Type 1 Error)
Interpretation: You predicted positive and it’s false.
False Negative: (Type 2 Error)
Interpretation: You predicted negative and it’s false.
Difference between Random forest and XG Boost.
1. Random Forest and XGBoost are decision tree algorithms where the training data is taken in a different manner. XGBoost trains specifically the gradient boost data and gradient boost decision trees. The training methods used by both algorithms is different. We can use XGBoost to train the Random Forest algorithm if it has high gradient data or we can use Random Forest algorithm to train XGBoost for its specific decision trees.
2. XGBoost helps in numerical optimization where the loss function of the data is minimized with the help of weak learners so that iteration happens in the local function in a differentiable manner. Random Forest is mostly a bagging technique where various subsets are considered and an average of each subset is calculated. Either random subset of features or bootstrap samples of data is taken for each experiment in the data.
3. Random subsamples of data are selected for Random Forest where the growing happens in parallel and overfitting is reduced with the combination of several underfitting features in the algorithm. Overfitting is reduced with the help of regularization parameters in XGBoost that helps to select features based on weak and strong features in the decision tree. Algorithm is the combination of sequential growth by combining all the previous iterations in the decision trees.
4. Random Forest has many trees with leaves of equal weight so that high accuracy and precision can be obtained easily with the available data. This makes the developers add more features to the data and look at how it performs for all the data given to the algorithm. XGBoost does not account for the number of leaves present in the algorithm. If the model predictability is not good, the algorithm performs better with more leaves in the decision tree. This improves the bias and the results completely depends on the data present in the algorithm.
Here's your problem of the day
Solving this problem will increase your chance to get selected in this company
Which collection class forbids duplicates?