Thursday, November 15, 2018

R Vs Python

Why doesn't Cheteshwar Pujara come one down in a 50 over match ?. Or why doesn’t Aswin bowl flighted deliveries in limited over matches ? Despite being the ‘Hitman’ of ODI, Rohit Sharma seldom makes the cut in a 5 day contest. Are the required skill set different for a test match against a one dayer or for that matter a T20 ? All of this can be answered with a simple phrase-horses for courses. You pick players that fit the bill. All selection has to be made in accordance with the requirements. That’s precise isn’t it ! Same thing can be carried to usage of software for ML. This blog particularly explores the scenarios which are more conducive to the usage of Python against R and vice-versa.
               Before we start, a little background of R and Python language is necessary in light of Machine Learning(ML). Python is a computer programming language. In the wake of recent advances in ML, python community contributed several libraries that enables one to play around with data. However, R as such is a typical Statistical programming language like Matlab. It was developed to cater to the Math community in the first place. There is lot of debate on which language is best and what to prefer for ML.The exasperation is aptly shown in the below image.


















The table below highlights the key differences between Python and R wrt certain commonly used practices in ML. The entries in the cell indicates the library and/or function used to execute the requirement. The colored grid indicates the superiority of a given language over the other. In case of a tie, both the cells are colored.


Functionality
Python
R
Data Slicing and Summary
pandas
Dplyr, data.table
Visualization
matplotlib
ggplot
Data Set Repositories
NA
Economteric Data: AER library
Linear Models (Regression family)
Scikit learn
Car glm
Hyperparamter Tuning
makeLearner
GridSearchCV
Natural Language Processing(NLP)
NLTK, gensim
Tidyverse,topicmodel
Web Scrapping
Beautiful soup
rvest
Interfacing with other System(like Outlook)
Pywin32
RDCOMClient
Read JSON
json
rjson
Pickeling
pickle
saveRDS,readRDS
Web App (Especially for Proof of Concept)
Django
R Shiny



Below is an explanation of the contents in the table:

  1. Data Slicing and Summary: Data filtering,sorting,summarization,etc are required in every ML exercise. In R, one can do this using functions from dplyr and data.table libraries. The pipe operator(%>%) from dplyr is specially useful as it helps in readability of a cascaded operation and in debugging. Python on the other hand has Pandas which doesn’t have a pipe operator. Thus cascaded operations on data becomes unmanageable
  2. Visualization: ggplot and associated libraries in R helps to create highly useful plots such as histograms,geographical heat maps,Interactive and animated graphs. Python has matplotlib library for creating graphs but doesn’t provide enhancements as ggplot does
  3. Data Set Repositories: There are a lot of data repositories in R. Users can invoked these from several libraries. Thus one can play around with the data and gain understanding. Some useful repositories include AER library that has useful census data. Python on the other hand doesn’t have any
  4. Linear Models: R and python both have libraries that helps in application of regression models. However, there is one aspect where R stands out as a clear winner: treatment of a categorical variable. N-1 encoding is automatically taken care of in R but in python it is at the discretion of the user
  5. Hyper parameter Tuning: Both languages offer extraction of optimal parameters using hyper parameter tuning. However, in python, one can tune more number of parameters in comparison to R. For instance in R, for a Random Forest algorithm, one can only tune number of trees, nodes and leaf size. However, using python, one can also tune in sample split parameter. More optimal parameters lead to better accuracy
  6. Natural Language Processing(NLP): Both R as well as Python have libraries to handle text. A lot of users will vouch for Python here but having used both the software, I didn’t find any difference between the two
  7. Web Scrapping: Python has methods from beautiful soup library to extract any element having an html tag. Things are more clearly and precisely defined in python. However, R doesn’t offer a one stop solution for extraction. A lot of libraries with no clear examples leave much to soul searching
  8. Interfacing with other System(like Outlook): Considering Python is a programming language, system integration is pretty matured. One can use python to communicate between two different systems such as Outlook and Python terminal. The protocols that govern such a communication are already there. On the other hand R doesn’t have well defined functions to do this
  9. Read JSON: Python takes less time to read and process a JSON file format in comparison to R. Also since text inside a JSON resembles a dictionary, using python to read and parse it makes a lot of sense
  10. Pickling: This can be done in both Python as well as R
  11. Web App (Especially for Proof of Concept): This can be done in both Python as well as R however, the time to create an App in R is less.

2 comments:

  1. Thanks for the very neat explanation!

    ReplyDelete
  2. Thanks for the feedback.you can check my other blogs as well

    ReplyDelete

Embed Shiny

Please wait...