0% found this document useful (0 votes)
3 views

Introduction_to_Big_Data_and_Data_Analysis.docx

Uploaded by

Mohammed Atta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Introduction_to_Big_Data_and_Data_Analysis.docx

Uploaded by

Mohammed Atta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to Big Data and Data Analysis

What is Big Data?


- Big data is a large-scale dataset. It is distributed and diverse. Therefore, it requires the use
of new technical architectures and analytics to enable insights that unlock new sources of
business value
o It can be disturbed it multiple locations
o It is diverse - it can be pictures, videos
 Requires new technical architectures
Why Data Analysis
- We discovered that businesses only analyze 1%-10% of the data they collected
o This means that businesses spend a huge amount of money on collecting data, but
they did not make a good use of the collected data
- There is a gap between the data we collect and the data we analyze
o This is becoming a growing field
- WHY: to close the gap between the/what we collect versus what we analyze
Sources of Big Data
- Mobile sensors
- Social media
- Video surveillance
o IOT (smart home, smart light bulbs)
- Video rendering.
- Smart grids
- Medical imaging
- Gene sequencing
What is Data?
- Data: a. piece of fact
- An attribute is a property or characteristic of an object
o Ex: eye color of a person, temperature, etc.
o Attribute is also known as variable, field, characteristic, or feature
1
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- A collection of attributes describes an object
o Object is also known as a record, point, case, sample, entity, or instance
o Entity: any living or non-living object
o An attribute is the characteristics of the entity

Tid Refund Marital Taxable


Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

o Every row represents an object


o Each column represents an attribute

Data Structures
- 2 types
o Structured data
o Unstructured data
- Structured data: data containing a defined data type, format, and structure
o Organized data

2
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- Unstructured data: data that has no inherent structure, which may include text documents,
PDFs, images, and videos
o Unorganized data
Attribute Values
- Attribute values are numbers or symbols assigned to an attribute
o Symbols: categorical
- Distinction between attributes and attribute values
o Same attribute can be mapped to different attribute values
 Ex: height can be measured in feet or meters
Representation of Raw Data
- Numerical: include real value variables or integer variables such as age, speed, or length
o 2 types:
 Discrete: whole numbers = integers
 Ex: number of patients
 Ex: number of costumers
 Ex: number of students in a class
 Continuous: all values are possible
 infinity
o Ex: 23.1, 23.01, 23.001
- Categorical: can be called symbolic variables
o 2 types:
 Nominal
 The order does not have a meaning
o Ex: eye color
o Ex: zip code
 Ordinal
 The order/rank does have a meaning
o Ex: sizes – small, medium, large, extra large
o Ex: lengths – short, medium, long
Data Quality

3
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- What kinds of data quality problems?
- How can we detect problems with the data?
- What can we do about these problems?
- Garbage in, garbage out
o Need to clean the data to have high quality data
- Examples of data quality problems:
o Noise and outliers
o Missing values
o Duplicate data

4
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
Powered by TCPDF (www.tcpdf.org)

You might also like