Cracking an interview is essential in getting the role you desire in any profession. When it comes to machine learning, various questions are asked in interviews.

We’ve compiled a list of 51 interview questions for machine learning.

1. What is the difference between artificial learning and machine learning?

Machine learning is the design and development of algorithms based on empirical data. On the other hand, artificial learning or intelligence in addition to machine learning always covers aspects such as planning, robotics, knowledge representation, natural language processing and more.

2. How do you ensure you’re not overfitting with a model?

There are three ways to avoid overfitting with a model:

  • Removal of noise in the training data by taking fewer variables, parameters and reduction of variance.
  • Using techniques like k-folds cross-validation.
  • By using regularization techniques like LASSO which can null the parameters that may be causing overfitting.

3. What is ‘Overfitting’ in Machine learning?

When a model is highly complex and involves numerous parameters, it generates random errors or noises which can be described as overfitting. This generally happens when the model has too many parameters concerning the number of training data types.

4. What are some differences between a linked list and an array?

A linked list is a chain of objects with pointers that guides how to process them sequentially, while an array is an ordered collection of objects. Also, an array has a fixed size and cannot grow organically, while a linked list is not bounded by size and can grow organically.

5. Describe a hash table

A hash table is a structure of data which constructs an associative array. Keys are mapped to particular values via a hash function. A hash table is generally used for tasks like indexing database.

6. Do you have research experience in machine learning?

This is a very subjective question. Recruiters generally look for people with experience in research. So, make sure that you have all the required documents to assure recruiters about your experience in research. It’s always great to have a research summary ready if an interviewer asks. This will create a great impression.

7. What are the five popular algorithms of Machine Learning?

The five most popular algorithms that are used in machine learning are support vector machines, decision trees, neural networks, nearest neighbour and probabilistic networks.

8. What evaluation approaches would you work to evaluate the effectiveness of a machine learning model?

The first step to evaluate the effectiveness of a machine learning model will be by splitting the dataset into training and test sets. The next step would be to implement performance metrics based on choice selection; Use this list to do so. You can also use measures like F1 score, confusion matrix and accuracy.

9. How is KNN different from k-means clustering?

The first between the two is that while KNN is a supervised classification algorithm, k-means clustering is an unsupervised classification algorithm. For the KNN to work, labelled data is needed to classify an unlabelled point into, while for k-means clustering, you only need a set of unlabelled points and threshold.

10. What are the three stages to build the hypotheses or model in machine learning?

The three stages to build the hypotheses in machine learning are model building, model testing and applying model.

11. What’s the “kernel trick” and how is it useful?

Kernel trick involves kernel functions which enable higher dimension spaces without actually calculating the coordinates within that dimension. It uses the inner products between the images of all pairs data in a feature space. This allows the calculation of coordinates of higher dimensions with low dimensional data.

12. Explain how a ROC curve works.

ROC is the difference between true positive rates and false-positive rates at a given threshold represented graphically. On a graph, the difference between the rates forms a curve which is why it’s called ROC curve.

13. When to use ensemble learning?

Ensemble learning is typically used when building component classifiers are more accurate and independent of each other.

14. What are the areas in robotics and information processing where the sequential prediction problem arises?

Model-based reinforcement learning, imitation learning and structured prediction are few of the areas where sequential prediction problem arises.

15. Pick an algorithm. Write the pseudocode for a parallel implementation.

This question is directed to check whether you can understand and think in parallelism. It’s also asked to test if you can handle concurrency dealing with big data. For pseudo-code frameworks, you can check out Peril-L and Web Sequence Diagrams for visualization tools to craft your own psuedo-code that showcases parallelism.

16. What is Bayes’ Theorem? How is it useful in a machine learning context?

Bayes’ Theorem is a way to calculate conditional probability. It describes the probability of events based on prior knowledge of conditions that may affect the events.

Bayes’ Theorems’ mathematical representation is:

          P(B|A) P(A)
P(A|B) = -------------


  • P(A|B) is the probability of event A occurring, given that event B has occurred.
  • P(B|A) is the probability of event B occurring, given event A has occurred.
  • P(B) is the probability of event B
  • P(A) is the probability of event A

17. What are the different methods for Sequential Supervised Learning?

Here are the methods for sequential supervised learning:

  • Graph transformer networks
  • Hidden Markow models
  • Sliding-window methods
  • Maximum entropy Markow models
  • Recurrent sliding windows
  • Conditional random fields

18. Define precision and recall.

Recall can also be called as the true positive rate, which means that the number of positives a model claims compared to the number of actual positives a data possesses.

Precision is similar to recall and is known as a positive predictive value which is a measure of the number of accurate positives a model claims compared to the number of positives it claims.

19. Explain what is the function of ‘Unsupervised Learning’?

The functions of unsupervised learning are:

  • Discover interesting directions in data
  • Perceive novel observations/ database cleaning
  • Find clusters of the data
  • Interesting coordinates and correlations
  • Find low-dimensional representations of the data

20. Where do you usually source datasets?

Deep learning interview questions like these are generally asked to test your interest in machine learning. People who have a keen interest in machine learning generally have a project of their own and know which sources are the best for datasets.

If you don’t know good sources for datasets then check out Quandl for their economic & financial data, and Kaggle’s Datasets for another great source of datasets.

21. What are your favourite use cases of machine learning models?

This again is a subjective question and will require your personal opinion. There are a lot of uses cases of machine learning models you may like, so follow this Quora thread to find out the use cases and have them prepared for your interview.

22. Which is more important to you: model accuracy, or model performance?

Questions like these are thrown at you to understand your machine learning interests as well as knowledge.

There’s no ideal situation where accuracy will be preferred over performance or vice versa. Model accuracy and performance are equally important as they have different uses.

For example, if you’re using a model to analyze medical diseases then accuracy will be critical and not how fast a model gives out result. Also, there might be other situations where the performance will be desired than accuracy. So, it depends on the model and use cases that revolve around it.

23. What’s the trade-off between bias and variance?

Bias is the error caused because of simple assumptions made in the learning algorithm.

Variance is the error caused because of the complexities in the learning algorithm.

The objective of a learning machine is to have minimal bias and variance. Increasing the bias decreases variance and vice versa. So, for a learning machine to have an optimum algorithm, both are used as a trade-off in an attempt to find the right algorithm for a machine.

24. What is the dimensionality reduction in Machine Learning?

It is the process of reducing random variables under consideration. Dimensionality reduction can be classified as feature selection and feature extraction.

Feature selection tries to find the subset of input variables, while feature extraction begins from an initial set of measured data and builds derived values.

25. Name an example where ensemble techniques might be useful.

Ensemble techniques are used for better optimized predictive performance. This is done by using learning algorithms within the ensemble techniques. Examples can be from bagging to boosting to the bucket of models method. You should also demonstrate how they could increase the predictive power.

26. How do you handle missing or corrupted data in a dataset?

Missing and corrupted data can be found in a dataset. The corrupted or missing data can be dropped from the rows and/or columns or by replacing their value.

27. What are the two methods used for the calibration in Supervised Learning?

Platt Calibration & Isotonic Regression are the two methods used for calibration in supervised learning.

28. What is the difference between heuristic for rule learning and heuristics for decision trees?

The difference between the two is very simple:

Heuristics for decision trees calculates the average quality of numerous of disjoint sets, while heuristic for rule learning only calculates the quality of the that is covered with the candidate rule.

29. How would you handle an imbalanced dataset?

An imbalanced dataset can be corrected by collecting more data to even the imbalances in the dataset, Resampling the dataset or by trying a different algorithm altogether on the dataset.

30. What is Genetic Programming?

Genetic programming is a method used in machine learning. In this technique, the model is based on testing and selecting the best out of the test results.

31. What are support vector machines?

Support vector machines are a part of supervised learning which are typically used for regression and classification analysis.

32. What is not Machine Learning?

Rule-based inference and artificial intelligence cannot be classified as machine learning.

33. What is the difference between supervised and unsupervised machine learning?

The major difference between the two is that in supervised learning you need training labelled data while in unsupervised learning, does not require explicit labelling.

34. Mention the difference between Data Mining and Machine learning?

Data mining is the extraction of knowledge or interesting patterns in the form of unstructured data.

Machine learning is the study, design and application of algorithms that help computer grow and learn without any programming.

Interview Advice Interview Questions
Bhaumik Karia

Bhaumik Karia

Content Marketing Strategist & Writer by profession, musician & traveller by choice.

Read More