Data Science and ML Interview Questions
Parag Verma
Introduction
This blog is an accumulation of all the interview questions I have faced as well the ones faced by my peers/friends.I will try to answer each question with brief explanation/notes
Q1:What are examples of some of the discrete probability distributions
Ans:Binomial and Poisson
Q2:If X is normally distributed then what is the probability that Pr(X=2)
Ans: 0.For a continuous Random Variable, the probability is not defined at a point
Q3:What are the percent of values between -2σ and 2σ for a normal distribution
Ans:
- 68% between -1σ and 1σ
- 95% between -2σ and 2σ
- 99% between -3σ and 3σ
Q4:Can you think of a situation when low R2 is justified
Ans: In all B2B(business to business) transactions, usual economic variables are not able to explain variance in the data.There are a lot of external factors which cant be captured in B2B scenarios
Q5:What is the probability distribution of p value
Ans: Uniform distribution
Q6:What is the basis of linear regression
Ans: Ordinary least square(OLS)
Q7:What is the basis of logistic regression
Ans: Maximum likelihood estimate(MLE)
Q8:What is the distribution of β coefficients in regression
Ans: In large samples,β is normally distribued
Q9:If X1 and X2 normally distributed, what will be distibution of X1 + X2
Ans: It will be a normal distribution because of central limit theorem
Q10:What are the assumptions of linear regression
Ans: There are 4 assumptions:
- The Conditional Distribution of ui given Xi has a mean of Zero
- X and Y are IID ie independent and identically distributed
- There is no multicollinearity
- There are no outliers
Q11:What are the evaluation metrics for linear regression
Ans: There are 2 metrics:
- R2adj
- Root Mean Square Error(RMSE)
Q12:How do we check presence of multicollinearity linear regression
Ans: Through Variance Inflation Factor(VIF).Variables with VIF values greater than 5 should be removed
Q13:Why do we take log of a variable in certain analysis
Ans: If the range of values in a variable is very high, then in order to compress the values/variance, we take log of that variable.
Q14:If there is a categorical variable with N unique values, then why do we only take dummy variables for N-1 values
Ans: If we take dummy variables for all N values, then it will lead to perfect multi-collinearity as shown below
D1 + D2 ….. Dn = 1
D1 + D2 ….. Dn = 1* X0
Here the regressor(X0 is the variable for β0.X0 is always equal to 1) is represented as perfect linear combination with values of variable
Q15:What metric does a decision tree use
Ans: Gini index.A decision tree uses gini index to sbplit a node
Q16:How can we select between two competing logistic regression models
Ans: There are 2 metrics:
- AIC/BIC:The value of Akaike and Bayesian Information criteria should be lower
- F1 Score: The value of F1 score should be high
Q17:What are the disadvantages of a decision tree use
Ans: Decision tree often leads to overfilling of data.Also creating an ensemble of decision trees lead of creation of correlated trees which offer no improvement in fit
Q18:How does a random forest work
Ans: Random forest is an ensemble model where multiple models are trained simultaneously on randomly sampled datasets and randomly selected attributes.This ensures that correlated trees are not generated in the ensemble.The final prediction is made using average/voting.
Q19:Give a scenario of overfitting in Machine Learning
Ans: If the model is trained on lets say 5% of the dataset, then it cant learn and generalize since there are less records.Once it is tested on test data, it performs poorly
Other instances where overfitting can happen is when we include variables with large number of unique/distinct values
Q20:What is sensitivity in classification model
Ans: Lets look at the confusion matrix.
Predicted | ||
Actual | No | Yes |
No | TN | FP |
Yes | FN | TP |
Accuracy: (TN+TP)/(TN+TP+FN+FP)
Sensitivity:Proportion of positive correctly classified.Also called true positive rate
TP/(TP+FN)Specificity:Proportion of Negatives correctly classified.Also called true negative rate
TN/(TN+FP)Precision: Proportion of positive cases predicted accurately by the model
TP/(TP+FP)Recall:Same as sensitivity
F1 Score: (2 X Precision X Recall)/(Precision+Recall) Important in cases where we want to shortlist best model among a set of competing models
Q21:Give an example of Unsuperivsed Machine Learning model
Ans:
- K Means clustering
- Latent Dirichlet Allocation
- Latent Semantic Allocation
Q22:Is KNN model Unsuperivsed or SUpervised Machine Learning model
Ans: KNN is a supervised ML Model
Q23:How do you identify cut off value in logistic regression
Ans: Refer to the below explanation
Normally the aim of all model building exercise is to maximise accuracy. However, we need to be careful about how many False positive(FP) and False negative(FN) cases are generated by the model. In such circumstances, we would want to minimise FN and FP. This can be done by selection of an appropriate threshold value which is obtained by plotting the Sensitivity and Specificity plots and taking the intersection of these graphs as the optimal point. The logic behind this approach is that the point of intersection of these graphs represents the maximum value for both Sensitivity as well as Specificity which mimimises FN as well as FP
Q24:In a throw of a dice, if X is a Random Variable which denotes the face that appears on a throw, what is the probability distribution of X
Ans: It will be a uniform distribution.Since P[X=1,2…6] in the long run is equal to 1/6, all the events have the same probability.
Q25:In a throw of a dice, if X is a Random Variable which denotes the face that appears on a throw, what is the probability distribution of X
Ans: It will be a uniform distribution.Since P[X=1,2…6] in the long run is equal to 1/6, all the events have the same probability.
Q26:How do you interpret log log model in regression
Ans: β coefficients obtained from log log model can be interpreted as elasticity of X on Y
lnY = β * ln(X) + u ln(Y+ΔY) = β * ln(X+ΔX) ΔY/Y = β * ΔX/X
β = (ΔY/Y)/(ΔX/X) β = Percentage change in Y/Percentage change in X β = elasticity
Q27:How can we reduce overfitting in regression
Ans: There are three methods:
L1 regularization(Lasso regression):The coefficients are constrained and reduced to a value around 0 in order to improve fit.A constraint parameter lamda Λ equal to sum of absolute value of coefficients is added
L2 regularization(Ridge regression):The coefficients are constrained and reduced in order to improve fit.A constraint parameter lamda Λ equal to sum of square of coefficients is added.The coefficients in this case can never be reduced to zero
Elastic Net:The coefficients are constrained in a combination which is equal to L1 and L2 regularization.It is basically a convex combination of L1 and L2