So, what’s the big deal?
Do I really need big data processing for my business? Is it genuinely useful or just another industry buzzword? Logistics, supply chain management, new product development, irregular workflow with peaks and troughs, competitor and customer analysis - can Big Data deal with any of my issues in these areas?
To answer these questions, first, let’s explain what exactly we mean by Big Data.
What is Big Data?
Well, it’s just data that is big. Big Data describes the processing of data that is generally too big to handle on a single machine or device. In the past, it was dealt with on multiple forms of databases or data warehouse solutions on distributed storage, like disk arrays or network-attached systems. As data processing needs grew, the term Big Data broadened to describe various forms of distributed data processing, usually in real-time (Fast Data) and in the cloud.
Simply put, Big Data processing enables the immediate collection of huge amounts of data, which can be integrated and cross-referenced to provide extremely useful information about issues as diverse as weather, road-traffic flow, footfall and customer buying behaviour. Furthermore, machine-learning algorithms can be used to create intelligent, self-adapting systems or Artificial Intelligence (AI) to not only identify patterns but respond accordingly, without the need for human judgement or intervention. Real- time data collection enables smart-routing whereby, data flows can be redirected to bypass overloads and blockages and nowcasting (aka now-forecasting) that responds to events as they unfold.
Companies like Google, Amazon, Facebook and LinkedIn are well-known for tracking your online behaviour and pushing relevant offers in your direction, e.g. “you might like …, “ “you might know ….” They do this by collecting your IP and port, browser ID, browser cookies (as a form of user ID), mouse movement (on popular sites), clicks (not only ad clicks), browsing history (through analytics cookies), keyboard events (timing between key presses of a letters in a word as a form of identification). From this data, they can identify a person and relate all data mined from different systems. Amazon uses big data analysis to find similarities in user behaviour. They profile users and then make better suggestions and target marketing campaigns. Google uses Big Data analysis for advertisements. They build your profile, mostly through browser cookies on sites that use analytics but also from services that Google provides to you like Gmail, Apps, Picasa or Drive that scan the content. Then they build a list of areas of your interest and make you a target of their advertisement campaigns.
So is Big Data only for big companies? Certainly they reap the benefits but so can smaller organisations by following things such as customer behaviour, financial markets, trading indexes and website traffic flow. This allows a company to notice what is happening, minimise risks and grasp opportunities, making their business more cost-effective, responsive and agile. It could take literally hundreds of workers to achieve the same outcome manually, whereas an automated Big Data processing system does all the heavy lifting for you.
Whether I afford it?
As a business owner or manager, you’ll naturally be concerned about data capacity and cost. Do you need massive computing power and a hefty budget?
How is Big Data processed?
It’s really just a sophisticated sausage machine along the lines of the diagram below.
To support this, the following tools are available:
Distributed data store - HDFS, Cassandra, HBase, Druid Distributed computing engine - Apache Spark, Storm, Apex Clustering platform - YARN, Mesos, Kubernates Data streaming platform - Kafka, Apache Flink, Amazon Kinesis Machine learning libraries - TensorFlow, MLLib, Singa Data analytics frontend - Tableau Public, Apache Zeppelin, Jupyter, DeepSense Seahorse There may be also some ETL (Extract Transform Load) tools: BI (Business Intelligence) tools or BPM (Business Process Management) tools present in some enterprise systems like that but they are usually not treated as a part of modern Big Data stack.
Are there any issues or negatives in Using Big Data?
OK, so it all sounds very clever, but are there any pitfalls or things to be aware of?
As a business owner or manager, you’ll naturally be concerned about data capacity and cost. Do you need massive computing power and a hefty budget? Not at all - usually Big Data needs Big Data farms (computing clusters) to store and compute data. But you don’t have to own it as cloud computing services are becoming more popular. You can have super computer power at your fingertips and pay only for computation time in a required moment of time. The storage space and data transfer getting cheaper day by day.
Are there any legal ramifications or laws to be aware of? As with all aspects of business, the answer is yes. If you analyse personal data there might be issues related to privacy and location of the data that are governed by a multitude of different laws in different countries. For example, in Poland personal data records have to be registered at a specialized government institution and they are required by law not to cross the borders of the country. It is possible to specify in a contract with your data management provider, which country or countries you want your data stored in.
The laws governing your data are determined by the country in which they are hosted and processed but they might be stored in one country, processed in another and sent via a third. The EU Data Protection Directive deals with this to some extent by requiring data to be stored with the European Economic Area or in a country with recognised similar laws but is there any way you kind find out exactly where your Big Data is being held? Indeed there is - there are a number of companies offering network protocol analysis apps that enable you to not only identify where your data goes and comes from but can also identify malicious software and intrusive messaging. These include the free, Ethereal and Wireshark and higher level, paid for brands such as LanHound and EtherPeek - so if you’re determined to know where your data is, you’ll certainly be able to find out.
Time to go large
To return to our original question, can Big Data make a categorical difference to your business? The answer is a resounding “yes!’ but how to get started and where best to apply Big Data might be unknown territory for you and a bit of a puzzle. That’s perfectly understandable and is why our experts are on hand to guide your company towards the massive benefits to be gained by harnessing the power of Big Data. Give us a call to discuss how you can become part of the winning trend.