He does n’t just need new buckets, but a completely new approach to looking at the problem just because the volume and velocity of water has grown. To prevent the town from flooding, maybe he needs his government to build a massive dam that requires an enormous civil engineering expertise and an elaborate control system. To make things worse, everywhere water is gushing out from nowhere and everyone is scared with the variety.
Welcome to Big Data.
- There are over 600 million tweets every day that is flowing every second which tells about the High Volume & Velocity
- Next need to understand what each tweet means – where is it from, what kind of a person is tweeting, is it trustworthy or not which tells about the High Variety
- Identify the sentiment – is this person talking negative about iPhone or positive? which describes about the High Complexity
- And finally need to have a way to quantify the sentiment and track it in real time which tells about High Variability
Traditional architecture of any Big Data solution would look something like below,
Data is collected from various sources such as Content management system, software application and then transferred to relational database management systems such as mssql,postgresql etc.
In order to analyse these data which are being collected , ETL part is done using a single machine most of the times and the necessary data is being transfered to some of the OLAP data warehouses for analyzing data. The data which is analysed finally is archived data which is not live/real time data. Which in terms refered as Death of data. Data which is finally analysed is only 10%.