“Predictive analytics” is a commonly used term today. Wikipedia describes it as ‘** encompassing a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events**’. This is a fairly accurate description and I believe the term is generally well understood. However, if you go a bit deeper and look at the process of building a predictive model, it is not so straightforward. So my attempt in this article is to talk about some of the basic principles used in building predictive models. To do so, I am going to pose some questions and answer them myself. Although the discussion is necessarily technical, it is at a rather high level and can benefit even those who are not involved in building models hands-on.

**Predictive models are about predicting the future. Why do they need historical data?**

Predictive models use information from the past, i.e., historical data, to make an inference about the future. An implicit assumption is that historical patterns are going to repeat in the future. If this assumption is invalid for any reason, the prediction made by the model in question is unlikely to be reliable.

**Do predictive models always involve complicated math?**

Not necessarily. Even a simple correlation check can help make a predictive inference. Consider two time series, **x** and **y**. If **x**(t) is highly correlated with **y**(t+1), then it means that having information about **x** at time t implies being able to predict **y** at time (t+1) with a reasonable accuracy.

The key to building a good predictive model is not in using any fancy math but in ensuring that the dependent and independent variable are defined carefully and the fallacy of using future information to make an inference about the same future is avoided. Let me elaborate on this last point with an example. Assume that available historical data includes customer behavior data including credit card payment history and response to a quarterly loan offer from Q1-2012 to Q1-2014, and the objective is to build a model to predict the customer’s response to the loan offer based on past payment history. One possible way of building this model is to use the response to loan offer in Q1-2014 as the dependent variable and use the payment history from Q1-2012 to Q4-2013 to form independent variables. It would be incorrect to use payment history from Q1-2012 to Q1-2014 to form independent variables though, because in that case we will be using the payment behavior in Q1-2014 to “predict” the response to the loan offer in the same timeframe!

** ****Which statistical techniques give better predictive models?**

There is no clear-cut answer to this question. Although certain business problems are more amenable to being modeled using certain kinds of statistical techniques, typically the efficacy of the model is determined by the data used to build it. A model that has access to a richer data source will generally be more effective. With a given data source, better results may be obtained by being creative about deriving new variables from the available data. As an example, consider the task of modeling the parabola y=x^{2} using OLS regression on (x,y) values. Since the technique used is linear in its parameters, if x is taken as the independent variable, the results won’t be great. But if you define x^{2} as a derived variable and use it as the independent variable for regression, the model will be a perfect fit!

**Can unsupervised learning be used to yield a predictive model?**

A predictive model is typically built using supervised learning (regression models, decision trees, etc.), but it is possible to use unsupervised learning to make a predictive inference. Clustering algorithms use unsupervised techniques. Imagine a clustering solution obtained by clustering customer behavior data up to time ‘t’. If you overlay the customers’ response to a certain offer at time (t+1) on the clusters obtained previously and find that there is a good variation in response values across clusters, then the clustering solution can be used to make a predictive inference about response to that particular offer.

I hope this was helpful. We will be covering more aspects of predictive models in later articles.

Worth it to read subject create a difference , thankyou for posting.

Right here is the perfect web site for anyone who would like to understand this topic.

You know so much its almost hard to argue with you (not that I personally would want to…HaHa).

You definitely put a new spin on a topic which has been written about for a long time.

Excellent stuff, just great!

When debating about, is Good Morning Snore Solution a scam (Dieter) we have been talking about a relatively tricky health area of interest.

Amazing article, thanks a lot !!

Hello! I’ve been following your website for some time now and finally got the bravery to go ahead and give you a shout out

from Austin Texas! Just wanted to say keep

up the good job!

My website Heart Attack

Howdy! This is my first comment here so I just wanted to give a quick shout out and

say I really enjoy reading your posts. Can you suggest any other blogs/websites/forums that deal with the same topics?

Many thanks!

Here is my web blog; http://wtc.la/fearanxiety40653

Hey very interesting blog!

My partner and I stumbled over here from a different page and thought I may as well check things out.

I like what I see so i am just following you. Look forward to looking at your web page repeatedly.

Great content, thanks for sharing !!

I simply want to tell you that I am just new to blogging and truly enjoyed you’re website. Almost certainly I’m going to bookmark your blog . You actually have great stories. Kudos for sharing your blog site.

Hey there, You have done a great job. I will certainly digg it and personally recommend to my friends.

I am confident they will be benefited from this web site.