Email data. It might not be big, but it is clever.

11 minute read

With estimates of anywhere between 100 and 200 billion email sends every day, the data relating to email marketing can seem big. That’s potentially a lot of data to analyse. But email marketing is not Big Data, not for most of us anyway, and at least not in the sense that the term is most usually used. True Big Data is synonymous with hidden value, but that doesn’t mean that analysing email data for insight is not valuable. On the contrary.

Big Data

The origins of the term Big Data, at least in its current context are uncertain. There are several claimants to the coining of the term. Since the mid 1990’s a number of computer companies, perhaps most notably Silicon Graphics (later rebranded as SGI and in 2009 acquired as a bankrupt by Rackable Systems) increasingly started using the term Big Data not just to describe a large quantity of information but also to add a qualification to the nature of the information. At the time, anybody serious about film making using CGI (Computer Generated Imagery) sat behind a powerful ‘Iris’ Silicon Graphics workstation. Processing the vast quantities of 3D image rendering data for films such as Walt Disney’s ‘Tron’ (1982) and Universal Picture’s ‘The Last Starfighter’ (1984) required substantial computing power (at least for the day). If that was your work then Silicon Graphics was the brand tool of choice.

Tron 1982

Today the term Big Data has become increasingly linked with the data challenges of organisations involved with applications such as digital communication, internet, weather prediction, medical science, archaeology, defence and security – to name just a few. Here, Big Data commonly refers to data sets that are not just simply (very) large, but for several key reasons are beyond the capabilities of traditional computer storage and processing technologies.

So if Big Data is not just big, what else is it?

Originally, the characteristics of Big Data were classified using the three V’s – Volume, Variety and Velocity.

Perhaps the most obvious, Volume, refers to the quantity of data which is generated, stored and processed. Defining ‘very’ large is a moving target but these days it’s firmly in the multi ‘peta’ arena (that’s multi-petabytes). To put that in perspective, it’s estimated that around 800 human brains, each of around 1.25 terabytes capacity, would fit nicely in 1 petabyte of storage. Very very, very large quantities of data, for the leading supercompute providers exa- and zeta-scale technology is already a reality, are still often referred to as Big Data just by virtue of their scale, even if the other defining characteristics are not evident.

HadoopThe second Big Data ‘V’, Variety, introduces the characteristic of structure to the definition. Big Data typically comprises both structured (that is traditional database type information) and unstructured data (that is data from more generic sources like sensing, mobile and social collection). Data variety is a key reason why both storage and pattern processing require specialised computational technologies. You may have heard of Hadoop – it’s a collaborative framework with a mixture of hardware and open source software for the distributed processing of large and variable data sets. It’s designed to be incredibly robust and scale to the largest of Big Data requirements. It’s used by security organisations, communications analysts and other hidden pattern seekers. By the way, Hadoop is named after the originator’s son’s toy elephant – really!

The third ‘V’, Velocity, refers to the speed at which data is either generated or the rate at which it is refreshed. Many Big Data applications rely on real time data capture, processing and re-application in order to realise their value. Volume and velocity commonly go hand in hand. Over 150 million sensors at the Large Hadron Collider at the CERN laboratories on the  Switzerland/France border generate data 40 million times per second. That’s volume and velocity.

Velocity can still apply independently. Even with ‘relatively’ small data sets such as those used in the automotive and aerospace industries for 3 dimensional Computational Fluid Dynamics (CFD) modelling, the real-time nature of the simulation calculations still brings it to the entry of the Big Data arena.

CFD simulation

Since the original three ‘V’s, others have gone on to add further granularity to the definition of Big Data. ‘Variability’ refers to inconsistencies within either structured or unstructured data, ‘Veracity’ describes the quality of the captured data, and, unfortunately, not another V, ‘Complexity’ characterises otherwise unconnected data coming from multiple sources.

All very interesting, but unless you are a Big Data technologist it’s not the amount, type or the speed of data that’s important. It’s what you as a business can learn from and do with the data that matters. Ultimately the value is in understanding what the data is telling us and gaining insights that lead us to better management decisions and strategy.

So where does that leave us as email marketers?

Even for the industry as a whole the numbers are not that ‘big’. 100-200 billion sends per day, 4 billion email accounts – it’s relatively small fry compared to the Big Data challenges of the Square Kilometre Array radio telescope – anticipated to be larger than today’s entire internet traffic.

What about Volume?

Only the very largest of commercial consumer businesses reach Big Data territory with their transactional and CRM databases. Mid-size businesses dominate the email marketing environment. The scale varies but the UK Government classifies mid-sized companies as £25-500 million per year with less than 250 employees. Customer base size depends greatly on the nature of the business but it’s nowhere near the ‘big’ of Big Data. For many SMEs 500,000 active CRM contacts, each perhaps containing up to 100 searchable records would be a significant database. Email marketing databases are typically smaller than stored CRM contacts (email permission is generally a subset of all stored CRM contacts). To many a genuinely opted-in email database of 100,000 is something of an arrival threshold. Those with a true permission based email database in the 10’s to multi-millions of subscribers are harder to find.

As for Variety…

As email marketers our data is structured, in fact very structured. It’s also extremely limited in type. It’s alphanumeric, generally nicely formatted and neatly organised into recognisable information fields, each with clear definitions. All of this means that the value that we can gain from it is very close to the surface. Accessing, analysing and gaining intelligence from this data is therefore relatively easy. There’s no need for complicated analysis or pattern seeking algorithms in order to find and extract the information needed. Applying the insight gained to improve future process is also relatively straightforward.

Finally Velocity.

This is also decidedly small in Big Data terms. Although some email marketers deliver what would normally be considered as very frequent campaigns, for example updated hourly financial or news information, most have a regular but much less frequent communication schedule. It depends on the market and the need of the subscribers, but weekly email campaign deliveries (although not necessarily to the same portion of the audience) would be considered more common, and to many the monthly newsletter is the standard. Many email marketers take steps to avoid high frequencies of contact – over communication can be detrimental to subscriber engagement. There’s an email equivalent to the Goldilocks syndrome – the ideal is not too much, not too little, just right. Data collection (adding new subscribers) can be quick but it’s nowhere near the velocity of Big Data. Even with regular campaign delivery the rate of data gathering and decay is relatively slow. Estimates are that email addresses and other associated data points decay at around 3% per month, a velocity which doesn’t even register on the scale of Big Data handling techniques.

So for most of us involved in day to day email marketing, our data is not ‘Big’. Nonetheless with the right processes for collection, organisation and useful application in place it’s still hugely valuable.

As I mentioned before the significance of data doesn’t revolve around how much data you have, but what you do with it. For email marketers intelligent use of data, even if it is ‘small’, is the step that separates the email ‘broadcasters’ from the ‘precision targeters’. There are lots of demonstrations of how intelligent email marketing is good for business results. According to business analyst’s Forbes, companies that put data at the centre of their email marketing improve their ROI by 15-20%.