Ai Chapter Notes-1
Ai Chapter Notes-1
Data:
Data is a collection of raw facts that can be processed to make information out of it.
The data can be in the form of numbers, words or symbols or even pictures.
a) numbers: 10,20, -20,20.3
b) words: You, we
c) symbols: =, +, *, &
Data Science:
It is all about applying mathematical and statistical principles to data in simple
words. Data science is the study of data. This data can be of 3 types: -
a) audio
b) visual
c) textual
For Data Science, usually the data is collected in the form of tables. These tabular
datasets can be stored in different formats.
a) CSV - CSV stands for comma separated values. It is a simple file format used
to store tabular data.
b) Spreadsheet - computer program which is used for accounting and recording
data using rows and columns into which information can be entered.
c) SQL - It is a domain specific language used in programming and is designed
for managing data held in different kinds of DBMS (Database Management
System) It is particularly useful in handling structured data.
Data sources:
Offline data sources, such as physical records, documents, reports, interviews,
focus groups, observations, etc.
Online data sources: Open-sourced Government Portals, web scraping, sensor,
camera, API etc.,
Data set: Dataset is a collection of different kinds of data. Two types:
Training dataset: Data used to train a machine learning model. Train the model
(70%) by using algorithm
Testing dataset: Test the model (30%).
Testing data is used to evaluate the model’s performance. (30%)
DATA INFORMATION
Collection of raw facts. Collection of organized data
Data is independent. Data is dependent
The measuring units: bits and bytes The measuring units: quantity, time etc.,
Example: Test score of a student Example: Average score of a class.