Introduction_to_Big_Data_and_Data_Analysis.docx
Introduction_to_Big_Data_and_Data_Analysis.docx
https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- A collection of attributes describes an object
o Object is also known as a record, point, case, sample, entity, or instance
o Entity: any living or non-living object
o An attribute is the characteristics of the entity
Data Structures
- 2 types
o Structured data
o Unstructured data
- Structured data: data containing a defined data type, format, and structure
o Organized data
2
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00
https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- Unstructured data: data that has no inherent structure, which may include text documents,
PDFs, images, and videos
o Unorganized data
Attribute Values
- Attribute values are numbers or symbols assigned to an attribute
o Symbols: categorical
- Distinction between attributes and attribute values
o Same attribute can be mapped to different attribute values
Ex: height can be measured in feet or meters
Representation of Raw Data
- Numerical: include real value variables or integer variables such as age, speed, or length
o 2 types:
Discrete: whole numbers = integers
Ex: number of patients
Ex: number of costumers
Ex: number of students in a class
Continuous: all values are possible
infinity
o Ex: 23.1, 23.01, 23.001
- Categorical: can be called symbolic variables
o 2 types:
Nominal
The order does not have a meaning
o Ex: eye color
o Ex: zip code
Ordinal
The order/rank does have a meaning
o Ex: sizes – small, medium, large, extra large
o Ex: lengths – short, medium, long
Data Quality
3
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00
https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
- What kinds of data quality problems?
- How can we detect problems with the data?
- What can we do about these problems?
- Garbage in, garbage out
o Need to clean the data to have high quality data
- Examples of data quality problems:
o Noise and outliers
o Missing values
o Duplicate data
4
This study source was downloaded by 100000851716698 from CourseHero.com on 12-11-2024 10:33:44 GMT -06:00
https://www.coursehero.com/file/141320028/Introduction-to-Big-Data-and-Data-Analysisdocx/
Powered by TCPDF (www.tcpdf.org)