Broadly speaking we follow 6 steps; of course these may vary depending on the business questions being asked and the data available.
The first step is to understand your business problem and the questions being asked. What is the problem you are trying to solve, and what is the business context? Once the problem is clearly defined we then express the business problem as an analytics problem. This ensures that a full understanding is gained in order to confirm exactly what the model is going to predict.
There is no point in building a fabulous model, only to find out later that what it is predicting doesn’t match what the business needs.
Sometimes clients may just give us access to a whole load of data and ask us to do something with it. In such a case we would to move directly to the data exploration stage, as described in step 2.
The next step is to explore the data, such as customer behaviour and transactions, and become more familiar with it to obtain a deeper understanding of the information gathered. This is especially important when dealing with a completely new data set.
The third step is preparing the data for modelling. This stage is influenced by the modelling technique used in stage 4. A big part of analytics relies on machine learning methods such as clustering, regression and classification that is used in predictive analytics!
At this point we will also identify and treat missing values, detect outliers, transform variables and so on.
Once the data is prepared, our data scientists can begin modelling. This is usually an iterative process where we run a model, evaluate the results, tweak our approach, run another model, evaluate the results, re-tweak and so on….. This continues, within agreed time limits, until the model produces satisfying results or delivers the best possible result with the given data and time constraints.
The final model (or maybe the best 2-3 models) will then be put through the validation process. In this process, we will test the model using a completely new data set i.e. data that was not used to build the model. This process ensures that that our model is generalisable to new data and not just a useful model for the specific data used earlier. On a technical level, this is called ‘avoiding over fitting’.
Implementation and tracking
After the validation process the final model is chosen. Then we start implementing the model and tracking the results to monitor the performance over time and help you find patterns and trends to answer important business questions.
For more information please get in touch!
You may be interested in…