Tuesday, October 9, 2018

Machine Learning : What's that ?

Machine Learning

Ever came across the term 'AI' ? Seen the 2004 movie 'I,Robot' where Robots evolve into a a destructive force and were constantly hounding Will Smith. Some enhanced references to AI is seen in the Marvel cinematic universe where 'Jarvis'  is seen helping Tony Stark finish the evil for good. Who can forget the entire 'Matrix' series where AI powered machines were seen scourging after 'Neo' in a post apocalyptic world. It is pretty fascinating to see AI used as a subject of many Sci-fi movies and taking on people's imagination, but what exactly is AI (Artificial Intelligence). To get a hang of AI, it is imperative to first walk the by lanes of Machine Learning (ML). 
            ML is a combination of steps where certain patterns/trends are extracted from the data and the pattern is then used on a similar unseen data to predict or generate score. The data can be related to Sales, Marketing, Order History, Tweets, Financial Transactions, Expense report etc. The steps that encompass ML are (but not restricted to) the following:
  1. Data Collection and Storage
  2. Data Manipulation
  3. Identification of appropriate algorithm/method for the problem
  4. Extraction of patterns/summaries/trends/sensitivities from the data (Learning from data)
  5. Pickling: Storing the extracted learning for later retrieval
  6. Applying the learning on unseen data
  7. Publishing the result in the form of exception reports
A block diagram to represent the above steps is shown below

Different components of the diagram are explained below:


  • Historical Data: The entire  analyses is done on this piece of information. Based on the trends,patterns, summaries etc extracted from this data, appropriate use case and algorithm to use is identified  
  • Storage into an RDBMS structure: The data is fed into a relational database which enables it to be queried by an external query based system. This helps in quick retrieval of historical data
  • Data manipulation: Most of the input data is unfit for carrying out any useful analyses. The data has to be manipulated in order to identify more hidden structures. Some commonly used data manipulation techniques includes (but are not limited to) Dummy Variable Coding, Level Pruning, Log Transformation, Normalization etc
  • ML Application and Pattern Extraction: Based on appropriate business use case, a suitable ML algorithm is identified that can be applied on the input data. Output of the  algorithm is normally sensitivity scores, effects, densities, etc
    • The extracted patterns also known as the learned part of the Model is normally stored onto the drive for later retrieval. This process is commonly referred to as Pickling
  • The learned part of the Model is then applied onto the live data (or the unseen data).  This normally results in scores, predictions etc.
    • Normally, some part of the output is sent to the 'ML Application and Pattern Extraction' phase where it is used for course correction. A typical example of this is seen in processes where a Data Steward validates the output of the Model and directs his findings to the Model Development team. This helps in better tuning of the ML
  • Publish Results: The results of the entire exercise is published in the form of exception reports. This is normally used by the end user for consumption

Machine Learning by themselves: Far from reality

In most places there is a lot of emphasis on need of an ML to automatically do the magic of learning. By now we are pretty clear with what learning is: Extraction of trends, patterns, summaries from the historical data so that we can mimic the properties of it at some later vantage point. This however is grossly exaggerated. All MLs have to be appropriately set up in order to achieve a decent level of precision. For instance, if my goal is to analyse Sales data, I have to make a distinction about what range of Sales data I can use. It will involve removal of Outliers, some transformation, binning etc. The steps leading up to an ML are so application dependent that each time I handle Sales, I have to come up with a different sanity check. All these myriad things make automatic learning a difficult proposition. Meaning an ML has to be assisted to get wonders out of it.

Next we will discuss about what most IT aficionado label ML(or AI) to be: A big bubble. Just like the dot com bubble, it is destined to swell to it ultimate collapse. But is there a modicum of truth in this supposition or is it  purely an over speculation ?


No comments:

Post a Comment

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...