Business Intelligence is what makes data meaningful, allowing you to make better decisions based on historical data and behaviours. The next level is getting your data to talk to you, to help you find out what could happen in the future, which is called predictive analytics. It’s the really clever stuff that answers questions such as:
What’s the probability of…?
Here we define and illustrate some of the main predictive data analytics methods and techniques, which are applied by our data analysts and data scientists.
Predictive Analytics & Machine Learning
Machine Learning provides computers with the ability to learn – without being overtly programmed, meaning they can teach themselves to grow and change when exposed to new data. Machine learning uses analytics from historical data to detect patterns in new data and adjust programme actions accordingly.
Machine learning examines small or large amounts of data possibly from many different sources with statistical algorithms such as clustering, regression and classification (see descriptions below). The objective is to discover patterns and then make predictions based on those often, complex patterns to answer business questions and solve problems.
Clustering – Is the task of separating a set of un-labelled objects into groups such that those in one group are more similar to each other than they are to objects in other groups. An example of a clustering problem is identifying groups of people with similar buying patterns. The input is a dataset where none of the samples is assigned to a specific group. The clustering method firstly identifies a set of groups and then associates each sample to a specific group (see Figure 1).
Figure 1. Example of clustering task. Input dataset of unlabelled samples (left) and outcome of the clustering with each sample associated to a group (right).
Regression – Is the task of determining the numeric response of numeric or categorical variables. An example would be; given the number of past purchases what’s the probability of a purchase of a specific product.
Figure 2. Example of regression task. The red dots are used to build a linear model (blue) that links the number of past purchases (i.e., input features) to the probability of purchase of a specific product (i.e., outcome of the regression).
Classification – Is the task of deciding which category a new object belongs to based on a model constructed from relationships between collections of existing objects that are already labeled. A simple example of a classification problem is shown in Figure 3. The aim is to predict the gender of a customer given the length of his/her first name and surname (i.e., feature1, feature2).
Figure 3. Example of classification task. The black line represents the model learned for example class 1 (males, in blue) and class 2 (females, in green). Every new data falling above (below) the black line will be associated to class 1 (class 2).
Glossary of Machine Learning terms
Algorithm – A self-contained set of steps that are followed in order to solve mathematical problem or computer processes.
Supervised Learning is a machine learning task that discovers rules to predict unknown values from labelled known examples. For example, if you had a database of customers and their past sales records, you could determine what items or how much new customers might spend based on the previous customer’s behaviours. Classification and regression algorithms are types of supervised learning algorithms.
Unsupervised Learning is a machine learning task to discover hidden structure or grouping in unlabelled data. For example, if you had a pool of anonymous customer information, you could use unsupervised learning to determine if the customers have some hidden behaviours or traits in common that naturally segment them into groups for targeted marketing. Clustering algorithms are types of unsupervised learning algorithms.
To unlock the value in your data please take a look at our other blogs and pages: