A Big Problem namely Big Data
Introduction
Nowadays the term big data is becoming more common and common and many people think that Big data is a technology that changing our world day by day but that above statement is wrong big data is not a technology it’s a problem so the question is what is this problem, when it’s started, who created this problem and how to deal with this problem.
Who Created this problem
Before learining what is this and when it’s we first need to understand who created this problem.
The truth is this problem is created by me, you actually every person is using internet right now is responsible for creating this problem.
So how I created this problem as I written that article means that I generated a data this was stored in medium database or server it’s depend on where medium is want to store there data.
So how you created this problem as you reading this article medium using their analytics to deliver the best content to our door step and every person has their own interest.
Last but not least medium also using analytics how many reader read that particular article on the basis of this medium recommend that prticular article to other person also and we even still not scratch the surface the whole analytics process of medium this analytics go much deeper that we think.
What is this problem
To answer in simple term let consider a particular eg:- I a take latest movie in pendrive from my friend and i want to store that movie from pendrive to my laptop or external hard drive but my hard disk (HDD) is full so i have now some option:-
- To delete that particular movie after watch.
- Or to delete any other thing from HDD
- Or to buy new HDD
In option 1 and 2 I have to delete some data in my case which is acceptable case but let consider medium deleted my post that i post way back 5 or 6 year and medium deleted it and some one want to read that post now medium it ruin the image of medium in case of any industry not only medium.
Let take youtube as an eg is it not shocking if I tell a I watch a video that was uploaded 12 years ago that means i can watch any video as long as is not monetize or deleted by the one who uploaded that particular video.
We can tell that today world is a world of information (Age of information) means data is irreplaceable not only for industry but also normal people like me and you.
In this age we can find any information easily any study material, our favorite songs and movies, classic novel much more in a single click.
But all this data rise to particular problem which problem is simply know as Big Data
Data that is too big to handle by a particular person or entity.
When it’s started
So how much data is consider as big data 1GB, 1TB(Tera Byte = 1000 GB) 1PB(Peta byte = 1000 TB) well it’s depend on who handling that particular data let consider 10 year back in 2010 in that time mobile storage is in GB so storing a movie(which is generally 1GB in size) in mobile is dream for nearly 80% of population but in just 5 year in 2015 its becoming rare and just after 3 year 2018 if some said my mobile has 2GB storage anyone first response is very less.
But this become much more devastating in 20th century.
2 V’s of Big data
Well it’s its generally 3 v’s
- Volume
- Velocity
- Variety
But variety is more concern to data processing or data analytics which is not concern of this blog.
Volume it mean the actually size of data
In general term Volume is the amount of data we want to store it may be 1GB 1TB 1PB etc.
Velocity
Let goes to our previous eg i chose option 3 and buy a new HDD(1TB chep one) and decided to store my every data on that new HHD so let do some calculation
Size of HDD = 1TB = 1,000GB = 1,000,000MB
Idel copy speed of data from HDD to HDD is = 40 - 50MB/s
Let fix it to = 50MB/s
Time taken (in sec) = Size of HDD / Copy speed
= 20,000 sec
Time in min = Time in sec / 60
= 20, 000 / 60
= 333Min 20 Sec
Time in hour = Time in min / 60
= 333Min 20 Sec / 60
5Hour 33Min 20Sec
So it take me sightly greater than 5 and half hour so i to start copy sleep and ta da my work is complete so where is the problem in my case it’s acceptable but in real world this is unacceptable let consider a search google for a something let take a bigdata and google didn’t give me result not even after 2min the problem is it with my net or problem is with google in real world it’s is impossible to wait for even 2min so how some can wait for more than nearly 5 hours and this problem is know as velocity
Velocity is consider the time taken to fetch a data from one medium to another.
How to deal with this problem
The best way to deal with this problem is distributed storage technology in this case we distributed data in small piece also know as block.
In distributed storage we generally use commodity server which is not realible but cheap in price.
That our commodity server is generally know as cluster.
Conslusion
In this article i only cover basic and not such technology words but until unless we can’t relate problem with real life we can’t solve it.
In upcoming article i going to cover more in depth of 3V’s of bigdata and much more about distributed storage and much more please look forward for it
If you find any mistake comment any comment both positive and negative is more than welcome.