FAQ 5: How much data is enough to get value from analytics?
How much data is enough to get value from analytics?
Many clients ask this question. We start talking about predictive analytics and modeling when the application has high complexity and analysing it by “eye-balling” it becomes difficult. This does not only happen when we have too much data, but also, for example, when there is too much data, or when process outcome is influenced by too many input variables, or when the inter-dependencies between the input variables are not clear.
In product development applications we often have too many input variables (product composition and all their properties, plus product properties and performance), and almost never enough data where all input parameters are sufficiently varied.
The main drawback of machine learning algorithms compared to classical statistical methods was that ML required a lot of samples for the algorithms to work properly. At DataStories we developed algorithms that work robustly starting from 25-30 samples. This is the minimum number of data points which allows distinguishing between random and non-random correlations.
This means that starting from 25 batches in a batch process, or 25 experiments, or 25 days of data we can start looking for meaningful insights and assess inter-relationships between the inputs and the outcomes.