Introduction to Data
Introduction to Data
Ajinkya Pol
RIMSR
What is a Data?
• Data is any set of characters that has been gathered and
translated for some purpose, usually analysis.
• It can be any character, including text and numbers, pictures,
sound, or video.
What is Digital Data?
• Digital data are discrete, discontinuous representations of
information or work.
• Digital data is a binary language.
Types of Digital Data
1.Unstructured Data
2. Semi Structured Data
3. Structured
Structured Data
• Refers to any data that resides in a fixed field within a record or file.
• Support ACID properties
• Structured data has the advantage of being easily entered, stored,
queried and analyzed.
• Structured data represent only 5 to 10% of all informatics data.
Unstructured Data
• Unstructured data is all those things that can't be so readily
classified and fit into a neat box.
• Unstructured data represent around 80% of data.
• Techniques: Data mining-Association rule, Regression analysis, Text
mining, NLP etc.,
Semi Structured Data
• Semi-structured data is a cross between the two. It is a type of
structured data, but lacks the strict data model structure.
• Semi-structured data is information that doesn’t reside in a
relational database but that does have some organizational
properties that make it easier to analyze.
Characteristic of Data
• Composition - What is the Structure, type and Nature of
data?
• Condition - Can the data be used as it is or it needs to be
cleansed?
• Context - Where this data is generated? Why? How sensitive
this data? What are the events associated with this data?
What is Big Data?
• Collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools
or traditional data processing applications.
What is Big Data? Cont..
• The data is too big, moves too fast, or doesn’t fit the structures
of your database architectures
• The scale, diversity, and complexity of the data require new
architecture, techniques, algorithms, and analytics to manage
it and extract value and hidden knowledge from it
• Big data is the realization of greater business intelligence by
storing, processing, and analyzing data that was previously
ignored due to the limitations of traditional data management
technologies.
Why Big Data? & what makes Big Data?
Availability of data