aureus-insights_logo

All Posts

What’s in an Algorithm Name?

Data Science Willy, as mentioned in a previous blog, is the Data Science cousin of William Shakespeare, and he loves to use and abuse terms and concepts from the literature of his great cousin to drive home points related to Data Science. More often than naught, he is annoying in being reverential. For instance, Data science Willy would be at absolute loggerheads with literature’s The Great William Shakespeare over the question “What’s in a name?”, especially if we look at “What’s in an algorithm?” or “What’s in an algorithm’s name?”.

What’s in a name? Literature Will quipped this very question via Juliet. Juliet, we know, is in love and is finding philosophical means to ignore the raging conflict of being in love with a member of a family with whom her family has been in blood and animosity drenched feud.

Humans are named mostly in their infancy when their major accomplishments include being true to the primal nature of a mammal, howling, crying, etc. The name bestowed on them is more aspirational and predictive. The only scientific help being from astrological sciences. I have always held the view that horoscope from various cultures was the first formal predictive model based on behavioral analysis (but this is a topic for another boring blog).

Romeo literally means a citizen of Rome, which makes sense since the tragedy was based in Verona, Italy - definitely a part of the Roman empire. Juliet means “youthful,” a beautiful but fairly short-sighted name. Imagine the ironic plight of an old woman named “Juliet.” Now, before die-hard fans of Literature Will pull out their daggers, I will switch to the Data Science Willy context.

Naming the Algorithm

Old Angel Oak Tree 1256x838

We need to be clear that algorithms constitute predictive models, but the two terms cannot be used interchangeably. A model is much more than an algorithm but cannot do without an algorithm. If the predictive model is a car, then the algorithm is the engine, and data is the fuel.

Machine learning algorithms, compared to the naming of human babies, are carefully named based on established (not predictive) facts of what they do and how they work. For example, the decision tree is a supervised machine learning algorithm that can be used to create various models. As the name suggests, a decision tree is a tree-like structure organized into branches and terminal nodes (leaves) to represent decision paths and decisions (classifiers or values).

Through machine learning algorithm, this tree-like structure is derived from existing data that contains known decisions (supervised machine learning with labeled training data) to create decisions for data records where prediction is needed. When the decision is a classifier, then we have a classification tree. When the decision is values, we have a regression tree.  The umbrella term to include both approaches is known as CART (Classification and Regression Tree). The decision path which helps arrive at a node is the key.

In complicated systems, the number of possible decision paths are so many that a single tree is not sufficient, and a group of trees proves more effective. It is not too difficult to guess the name of the resultant approach; it has something to do with the woods, a synonym perhaps; yes, forest it is. Since the formation of trees from data using machine learning is randomized using various approaches, the algorithm is named “Random Forest.”

Naming the Approach or Model

Sometimes people who have determined the approach or have provided a sound theoretical base for the advancement of math and ultimately to the formation of an approach are honored by including their name in the algorithm. For example, Thomas Bayes is revered in Naïve Bayes by calling him naïve. The bad joke being forgiven, the naivety depicts the independence of features and not anything else.

The name for Hidden Markov models does not imply that Andrey Markov is hidden somewhere in the models, but the fact that the underlying system is assumed to have unobservable states contributing to a Markov process. Andrey Markov defined the Markov chain/process; therefore, every other approach based on Markov chains honors Andrey by using his name in the approach. Markov chains have a wide range of applications in machine learning, and hidden Markov has seen much success in reinforcement learning areas.

Data Science Willy, I am sure, is happy that machine learning algorithms are not named like human babies. They are factual rather than being doting and aspirational. Sometimes so factual that the name can completely describe the algorithm at the cost of totally alienating non-tech audience (defined as people who are alien to this science; so, it makes sense!). A few examples:

  1. Three hidden-layer fully connected feedforward artificial neural network – this was the name encompassing the whole sentence
  2. Long-term, short-term memory: should we forget this or should we not remember this because it is easy but confusing
  3. Latent Dirichlet Allocation: This is an excellent topic to model for this naming conundrum
  4. VAEGAN (Variational Autoencoder Generative Adversarial Network): Healthy rivalry between adversarial networks; if you deem so.

Conclusion

Coming to the leading protagonist of literature’s The Great Will’s tragedy Romeo and Juliet, yes Romeo, we are talking about him. His name should never be used in the same sentence in which the word learning appears (and I break the rule while stating it). Romeo never learned the lesson of love’s hopeless helpless but heartful “happinesslessness” (new word) with his unexpressed love for Rosaline (yup, if you read your Romeo and Juliet well you will know that there was Rosaline before Juliet) and if he would have survived Juliet alive, then there would probably have been a  Jasmine.

However, Data Science Willy learns a lot from The Great Will, so Willy restates:

“Algorithms may fall when there’s no strength in data.”

“Prediction is a smoke made with fumes of past patterns.”

“These analytical delights have predictive ends.”

Interested in learning how Aureus can help you leverage machine learning to predict your customer's behavior? Click on the link below to get more information.

More Information

Nitin Purohit
Nitin Purohit
Nitin is CTO and co-founder at Aureus. With over 15 years of experience in leveraging technology to drive and achieve top-line and bottom-line numbers, Nitin has helped global organizations optimize value from their significant IT investments. Over the years, Nitin has been responsible for the creation of many product IPs. Prior to this role at Aureus, Nitin was the Global Practice Head for Application Services at Omnitech Infosolutions Ltd and was responsible for sales and profitability of offerings from application services across geographies.

Related Posts

Data and Innovation: 2 Sides of the Same Coin

As we set our feet in 2023, having experienced a roller-coaster ride last year thanks to the geopolitical tensions and some lingering rub-off effects of COVID-19, it drives home that "change is the only constant." Like any other industry, insurance is undergoing paradigm changes at different levels, whether recruiting potential candidates or customer onboarding, to name a few. However, a common thread that ties the myriad business functions of an insurance company has been data and innovation. There has been an ever-increasing need for insurance providers to use data and embrace innovation in their routine activities, eventually to stand the cut-throat competition.

Intelligent Risk Assessment in Insurance

Risk Management is a core function within the insurance industry. It is a vital responsibility of the underwriting team. Insurance companies collect data scattered across different business units in various formats – some of which are paper and digital, most of which are typically unstructured. The underwriting team doesn't have immediate access to the information required for internal and external decision-making, resulting in delays in making decisions and costly mistakes.

Why Does the Long-term Nature of Life Insurance Products Make Customer Retention Difficult?

Most insurers offer similar products and services, which makes it challenging to attract new customers and retain them. As an industry, insurance is low-touch, and insurers seldom interact with their customers. A report shows that the top companies have an average customer retention rate of 93 - 95 percent, while insurance companies have an average of 84 percent.