0% found this document useful (0 votes)
9 views4 pages

Introduction_to_Big_Data_and_Data_Analysis.docx

Uploaded by

Mohammed Atta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Introduction_to_Big_Data_and_Data_Analysis.docx

Uploaded by

Mohammed Atta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to Big Data and Data Analysis

What is Big Data?


- Big data is a large-scale dataset. It is distributed and diverse. Therefore, it requires the use
of new technical architectures and analytics to enable insights that unlock new sources of
business value
o It can be disturbed it multiple locations
o It is diverse - it can be pictures, videos
 Requires new technical architectures
Why Data Analysis
- We discovered that businesses only analyze 1%-10% of the data they collected
o This means that businesses spend a huge amount of money on collecting data, but
they did not make a good use of the collected data
- There is a gap between the data we collect and the data we analyze
o This is becoming a growing field
- WHY: to close the gap between the/what we collect versus what we analyze
Sources of Big Data
- Mobile sensors
- Social media
- Video surveillance
o IOT (smart home, smart light bulbs)
- Video rendering.
- Smart grids
- Medical imaging
- Gene sequencing
What is Data?
- Data: a. piece of fact
- An attribute is a property or characteristic of an object
o Ex: eye color of a person, temperature, etc.
o Attribute is also known as variable, field, characteristic, or feature
1
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- A collection of attributes describes an object
o Object is also known as a record, point, case, sample, entity, or instance
o Entity: any living or non-living object
o An attribute is the characteristics of the entity

Tid Refund Marital Taxable


Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

o Every row represents an object


o Each column represents an attribute

Data Structures
- 2 types
o Structured data
o Unstructured data
- Structured data: data containing a defined data type, format, and structure
o Organized data

2
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- Unstructured data: data that has no inherent structure, which may include text documents,
PDFs, images, and videos
o Unorganized data
Attribute Values
- Attribute values are numbers or symbols assigned to an attribute
o Symbols: categorical
- Distinction between attributes and attribute values
o Same attribute can be mapped to different attribute values
 Ex: height can be measured in feet or meters
Representation of Raw Data
- Numerical: include real value variables or integer variables such as age, speed, or length
o 2 types:
 Discrete: whole numbers = integers
 Ex: number of patients
 Ex: number of costumers
 Ex: number of students in a class
 Continuous: all values are possible
 infinity
o Ex: 23.1, 23.01, 23.001
- Categorical: can be called symbolic variables
o 2 types:
 Nominal
 The order does not have a meaning
o Ex: eye color
o Ex: zip code
 Ordinal
 The order/rank does have a meaning
o Ex: sizes – small, medium, large, extra large
o Ex: lengths – short, medium, long
Data Quality

3
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- What kinds of data quality problems?
- How can we detect problems with the data?
- What can we do about these problems?
- Garbage in, garbage out
o Need to clean the data to have high quality data
- Examples of data quality problems:
o Noise and outliers
o Missing values
o Duplicate data

4
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00

https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
Powered by TCPDF (www.tcpdf.org)

You might also like