Data Science and ML Interview Questions

Introduction

This blog is an accumulation of all the interview questions I have faced as well the ones faced by my peers/friends.I will try to answer each question with brief explanation/notes

Q1:What are examples of some of the discrete probability distributions

Ans:Binomial and Poisson

Q2:If X is normally distributed then what is the probability that Pr(X=2)

Ans: 0.For a continuous Random Variable, the probability is not defined at a point

Q3:What are the percent of values between -2σ and 2σ for a normal distribution

Ans:

68% between -1σ and 1σ
95% between -2σ and 2σ
99% between -3σ and 3σ

Q4:Can you think of a situation when low R2 is justified

Ans: In all B2B(business to business) transactions, usual economic variables are not able to explain variance in the data.There are a lot of external factors which cant be captured in B2B scenarios

Q5:What is the probability distribution of p value

Ans: Uniform distribution

Q6:What is the basis of linear regression

Ans: Ordinary least square(OLS)

Q7:What is the basis of logistic regression

Ans: Maximum likelihood estimate(MLE)

Q8:What is the distribution of β coefficients in regression

Ans: In large samples,β is normally distribued

Q9:If X1 and X2 normally distributed, what will be distibution of X1 + X2

Ans: It will be a normal distribution because of central limit theorem

Q10:What are the assumptions of linear regression

Ans: There are 4 assumptions:

The Conditional Distribution of ui given Xi has a mean of Zero
X and Y are IID ie independent and identically distributed
There is no multicollinearity
There are no outliers

Q11:What are the evaluation metrics for linear regression

Ans: There are 2 metrics:

R2adj
Root Mean Square Error(RMSE)

Q12:How do we check presence of multicollinearity linear regression

Ans: Through Variance Inflation Factor(VIF).Variables with VIF values greater than 5 should be removed

Q13:Why do we take log of a variable in certain analysis

Ans: If the range of values in a variable is very high, then in order to compress the values/variance, we take log of that variable.

Q14:If there is a categorical variable with N unique values, then why do we only take dummy variables for N-1 values

Ans: If we take dummy variables for all N values, then it will lead to perfect multi-collinearity as shown below

D1 + D2 ….. Dn = 1

D1 + D2 ….. Dn = 1* X0

Here the regressor(X0 is the variable for β0.X0 is always equal to 1) is represented as perfect linear combination with values of variable

Q15:What metric does a decision tree use

Ans: Gini index.A decision tree uses gini index to sbplit a node

Q16:How can we select between two competing logistic regression models

Ans: There are 2 metrics:

AIC/BIC:The value of Akaike and Bayesian Information criteria should be lower
F1 Score: The value of F1 score should be high

Q17:What are the disadvantages of a decision tree use

Ans: Decision tree often leads to overfilling of data.Also creating an ensemble of decision trees lead of creation of correlated trees which offer no improvement in fit

Q18:How does a random forest work

Ans: Random forest is an ensemble model where multiple models are trained simultaneously on randomly sampled datasets and randomly selected attributes.This ensures that correlated trees are not generated in the ensemble.The final prediction is made using average/voting.

Q19:Give a scenario of overfitting in Machine Learning

Ans: If the model is trained on lets say 5% of the dataset, then it cant learn and generalize since there are less records.Once it is tested on test data, it performs poorly

Other instances where overfitting can happen is when we include variables with large number of unique/distinct values

Q20:What is sensitivity in classification model

Ans: Lets look at the confusion matrix.

Predicted
Actual	No	Yes
No	TN	FP
Yes	FN	TP

Accuracy: (TN+TP)/(TN+TP+FN+FP)
Sensitivity:Proportion of positive correctly classified.Also called true positive rate
TP/(TP+FN)
Specificity:Proportion of Negatives correctly classified.Also called true negative rate
TN/(TN+FP)
Precision: Proportion of positive cases predicted accurately by the model
TP/(TP+FP)
Recall:Same as sensitivity
F1 Score: (2 X Precision X Recall)/(Precision+Recall) Important in cases where we want to shortlist best model among a set of competing models

Q21:Give an example of Unsuperivsed Machine Learning model

Ans:

K Means clustering
Latent Dirichlet Allocation
Latent Semantic Allocation

Q22:Is KNN model Unsuperivsed or SUpervised Machine Learning model

Ans: KNN is a supervised ML Model

Q23:How do you identify cut off value in logistic regression

Ans: Refer to the below explanation

Normally the aim of all model building exercise is to maximise accuracy. However, we need to be careful about how many False positive(FP) and False negative(FN) cases are generated by the model. In such circumstances, we would want to minimise FN and FP. This can be done by selection of an appropriate threshold value which is obtained by plotting the Sensitivity and Specificity plots and taking the intersection of these graphs as the optimal point. The logic behind this approach is that the point of intersection of these graphs represents the maximum value for both Sensitivity as well as Specificity which mimimises FN as well as FP

Q24:In a throw of a dice, if X is a Random Variable which denotes the face that appears on a throw, what is the probability distribution of X

Ans: It will be a uniform distribution.Since P[X=1,2…6] in the long run is equal to 1/6, all the events have the same probability.

Q25:In a throw of a dice, if X is a Random Variable which denotes the face that appears on a throw, what is the probability distribution of X

Ans: It will be a uniform distribution.Since P[X=1,2…6] in the long run is equal to 1/6, all the events have the same probability.

Q26:How do you interpret log log model in regression

Ans: β coefficients obtained from log log model can be interpreted as elasticity of X on Y

lnY = β * ln(X) + u ln(Y+ΔY) = β * ln(X+ΔX) ΔY/Y = β * ΔX/X

β = (ΔY/Y)/(ΔX/X) β = Percentage change in Y/Percentage change in X β = elasticity

Q27:How can we reduce overfitting in regression

Ans: There are three methods:

L1 regularization(Lasso regression):The coefficients are constrained and reduced to a value around 0 in order to improve fit.A constraint parameter lamda Λ equal to sum of absolute value of coefficients is added
L2 regularization(Ridge regression):The coefficients are constrained and reduced in order to improve fit.A constraint parameter lamda Λ equal to sum of square of coefficients is added.The coefficients in this case can never be reduced to zero
Elastic Net:The coefficients are constrained in a combination which is equal to L1 and L2 regularization.It is basically a convex combination of L1 and L2

Link to Previous Blogs

https://www.aimlmadeeasy.com

My Youtube Channel

Link

Machine Learning Made Easy

Thursday, January 20, 2022

Data Science and ML Interview Questions

Data Science and ML Interview Questions

Parag Verma

Introduction

Q1:What are examples of some of the discrete probability distributions

Q2:If X is normally distributed then what is the probability that Pr(X=2)

Q3:What are the percent of values between -2σ and 2σ for a normal distribution

Q4:Can you think of a situation when low R2 is justified

Q5:What is the probability distribution of p value

Q6:What is the basis of linear regression

Q7:What is the basis of logistic regression

Q8:What is the distribution of β coefficients in regression

Q9:If X1 and X2 normally distributed, what will be distibution of X1 + X2

Q10:What are the assumptions of linear regression

Q11:What are the evaluation metrics for linear regression

Q12:How do we check presence of multicollinearity linear regression

Q13:Why do we take log of a variable in certain analysis

Q14:If there is a categorical variable with N unique values, then why do we only take dummy variables for N-1 values

Q15:What metric does a decision tree use

Q16:How can we select between two competing logistic regression models

Q17:What are the disadvantages of a decision tree use

Q18:How does a random forest work

Q19:Give a scenario of overfitting in Machine Learning

Q20:What is sensitivity in classification model

Q21:Give an example of Unsuperivsed Machine Learning model

Q22:Is KNN model Unsuperivsed or SUpervised Machine Learning model

Q23:How do you identify cut off value in logistic regression

Q24:In a throw of a dice, if X is a Random Variable which denotes the face that appears on a throw, what is the probability distribution of X

Q25:In a throw of a dice, if X is a Random Variable which denotes the face that appears on a throw, what is the probability distribution of X

Q26:How do you interpret log log model in regression

Q27:How can we reduce overfitting in regression

Link to Previous Blogs

My Youtube Channel

2 comments:

Price Elasticity Model in Python