Batman Vs Superman: Who's better ? DC fans world over break their heads over it. While one is a demi god, the other is a master strategist. In all comics and movie adaptations batsman has beaten Man of Steel hands down. Despite Man of Steels's krytonian powers and herculean machismo ,bat vigilante never misses a beat to pip him in all departments of combat. This is because all adventures demand romance at the expense of powerful characters. Depicting someone overcome hurdles and go the distance brings hope into the otherwise despair world. There is something beautiful about how an underdog finds his ways up the world order. It is indeed poetic to find something so powerless yet so powerful. This comparison rallies an important point: all confrontations are unfair. Even though Man of Steel and Dark Knight are part of same cinematic universe and work astride in Gotham, their comparison is grossly unjustified. In the context of defending the city against goons, batman's solution are practical and seems to work. Even though superman is brute power and adrenaline, his settings are mostly fights with extra terrestrial or against his own kind. This makes his powers/ methods irrelevant for run of the mill problems. Even though the above excerpt is modeled around comic book characters, they are so true for understanding which Machine Learning method is most relevant and sought after. In the course of this discussion, we will look at how Unsupervised ML fair against Supervised ML and what problems a typical Data Scientist face at workplace.
Course curriculum the world over has taught us to tackle structured problems. But problems are seldom structured in nature. That's why there is a huge gap between academia and industry. There is an altogether different host of skills a person needs to acquire to survive in an industrial set up. Especially for an industry like Data Science which is continuously evolving, a person needs to have skills to identify THE PROBLEM and convert it into a structured one. Most of us are given a very open ended problem where the stakeholder wants to do something with the data. This has been depicted below in the form of a caricature.
In this situations, Supervised Machine Learning Algorithms seldom helps as for applying them the problem has to be very precise with the need to have an Independent set of Features affecting a given target(s).For almost all problems, Unsupervised MLs provide some respite. All these algorithms are based on identifying patterns, customer segments, dimension reduction, association rules, etc. Ease of application also brings in the prejudice of subjectivity. There are very few diagnostic measures that can be used to ascertain the effectiveness of an Unsupervised ML. With the absence of metrics like accuracy, mape, p value, etc, the onus of ensuring that the method clocks in the desired results really lies with the Data Scientist.
I am listing some of the widely used Unsupervised MLs along with the relevant industry and use case.
Algorithm/Method
|
Domain
|
Use Case
|
Clustering
|
Marketing
|
Identify natural groups
within customers to customize marketing campaigns
|
Principle Component Analysi(PCA)
|
Marketing
|
Generally used in Survey
data to reduce the number of variables
|
Market Basket Analysis
|
Retail
|
To identify product based rules and association between items
|
Multi Collaborative
Filtering
|
Retail
|
To identify product based rules and association between items and
impact of demographics
|
Topic Modelling
|
Sales
|
To identify the category
into which a particular purchase falls based on the description of item
|
Density Based Methods
|
Finance/HR
|
To identify fraudulent
expenses report submitted by an Employee
|
Histogram based Outlier Scoring(HBOS)
|
Finance
|
To identify anomalous
transaction
|
KNN
|
Retail
|
Recommend similar items
based on user profile
|