Chapter-3 Data Sciences Study Materials Final-1
Chapter-3 Data Sciences Study Materials Final-1
Sources of Data
There exist various sources of data from where we can collect any type of data required and the data
collection process can be categorised in two ways: Offline and Online.
While accessing data from any of the data sources, following points should be
kept in mind:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken form reliable sources as the data collected from
random sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in proper
training of the AI model.
Types of Data
For Data Science, usually the data is collected in the form of tables. These tabular
datasets can be stored in different formats. Some of the commonly used formats
are:
1. CSV: CSV stands for comma separated values. It is a simple file format used to
store tabular data. Each line of this file is a data record and each record consists
of one or more fields which are separated by commas. Since the values of records
are separated by a comma, hence they are known as CSV files.
2. Spreadsheet: A Spreadsheet is a piece of paper or a computer program which
is used for accounting and recording data using rows and columns into which
information can be entered. Microsoft excel is a program which helps in creating
spreadsheets.
3. SQL: SQL is a programming language also known as Structured Query Language.
It is a domain-specific language used in programming and is designed for
managing data held in different kinds of DBMS (Database Management System) It
is particularly useful in handling structured data.