Big Data Notes
Big Data Notes
Big data is a term that describes the large volume of data both structured and
unstructured that inundates a business on a day-to-day basis.
OR
The term “big data” refers to data that is so large, fast or complex that it’s difficult
or impossible to process using traditional methods.
OR
Big data” is high-volume, velocity, and variety information assets that demand
cost-effective, innovative forms of information processing for enhanced insight
and decision making.”
It includes data mining, data storage, data analysis, data sharing, and data
visualization.
Big data can be analyzed for insights that lead to better decisions and strategic
business moves.
Big Data analytics examples includes stock exchanges, social media sites, jet
engines, etc.
STRUCTURED
Any data that can be stored, accessed and processed in the form of fixed format
is termed as a 'structured' data.
Example: Data stored in a relational database management system, employee
table in a company database.
UNSTRUCTURED
Any data with unknown form or the structure is classified as unstructured data.
Example: Email, Heterogeneous data source containing a combination of simple
text files, images, videos etc., the output returned by 'Google Search'.
SEMI-STRUCTURED
Semi-structured data pertains to the data containing both the formats mentioned
above, that is, structured and unstructured data.
Example: Personal data stored in an XML file.
VOLUME
We already know that Big Data indicates huge ‘volumes’ of data that is being
generated on a daily basis from various sources like social media platforms,
business processes, machines, networks, human interactions, etc.
Such a large amount of data are stored in data warehouses.
VARIETY
Variety of Big Data refers to structured, unstructured, and semi structured data
that is gathered from multiple sources.
Nowadays, unstructured data in the form of emails, photos, videos, monitoring
devices, PDFs, audio, etc. are also being considered in the analysis applications.
VELOCITY
Velocity essentially refers to the speed at which data is being created in real-
time.
Big Data Velocity deals with the speed at which data flows in from sources like
business processes, application logs, networks, and social media sites,
sensors, Mobile devices, etc.
VARIABILITY
This refers to the inconsistency which can be shown by the data at times.
The importance of big data does not revolve around how much data a company
has but how a company utilizes the collected data.
The company can take data from any source and analyze it to find answers
which will enable:
Cost reductions,
Time reductions,
New product development and optimized offerings, and
Smart decision making.
When you combine big data with high-powered analytics, you can accomplish
business-related tasks such as:
Determining root causes of failures, issues and defects in near-real time.
Generating coupons at the point of sale based on the customer’s buying
habits.
Recalculating entire risk portfolios in minutes.
Detecting fraudulent behavior before it affects your organization.
REASONS WHY BIG DATA NOT ACCEPTABLE IN PAKISTAN
LACK OF DATA
Poor data quality and accuracy is a major obstacle to the success of company’s
analytics efforts.
Most analysts feel that the quality of data provided to them is inaccurate or
incomplete.
MISSING TIMELINE
Results produced much after the desired time is also a reason for failure among
smaller companies.
A need for disciple and better time management must be instilled (taught or
introduced) from the beginning.