0% found this document useful (0 votes)
19 views

Introduction To Data Science

This document provides an introduction to data science, including its uses, applications, and importance. Data science uses concepts from mathematics, statistics, computer science, and information science to discover patterns in raw data and make predictions. It allows analysis of large amounts of unstructured data that traditional methods cannot effectively process. Common applications of data science include customer services, risk modeling, medical diagnosis, and more. Data can be collected from both online and offline sources, and comes in formats like CSV, spreadsheets, and SQL databases. Python libraries like NumPy, Pandas, and Matplotlib can then be used to access and visualize the data.

Uploaded by

rohriraanju
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Introduction To Data Science

This document provides an introduction to data science, including its uses, applications, and importance. Data science uses concepts from mathematics, statistics, computer science, and information science to discover patterns in raw data and make predictions. It allows analysis of large amounts of unstructured data that traditional methods cannot effectively process. Common applications of data science include customer services, risk modeling, medical diagnosis, and more. Data can be collected from both online and offline sources, and comes in formats like CSV, spreadsheets, and SQL databases. Python libraries like NumPy, Pandas, and Matplotlib can then be used to access and visualize the data.

Uploaded by

rohriraanju
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction to Data Science

Data Science is one of the important concepts to provide some features like
statistics, data analysis, machine learning and deep learning. Data science
helps to understand and analyse the actual scenario and help to take fruitful
decisions.

Data Science is just not a single field but it uses concepts and principles of
Mathematics, Statistics, Computer Science and Information Science. It is also
capable to discover hidden patterns from the raw data. It can also use for
predictions. Observe the following picture which differentiates the points
between data analysis and data science.

Data Analysis vs Data science

Why Data Science?


Earlier the data processing was quite easy because data was limited and
structured. The structure can be analysed easily and effectively. Nowadays
more than 80% of data is unstructured. So with unstructured data, the
traditional methods cannot work appropriately.

In addition to this, day by day the number of internet users increasing day by
day. So it increases the use of unstructured data. These unstructured data
collected by the various organizations through mobile apps, websites and
other platforms can be used to serve the specific requirements of the
customer and users. This will increase the demand for data science.
Applications of data science
S.No Field Application Area

Customer Services
1 Banking Risk Modelling
Fraud Detection

Decision-Making
2 Finance Risk Analytics
Algorithm Trading

Maintainance Scheduling
Predicting Problems
3 Manufacturing
Detecting Anomalies
Automated Processes

Optimizing Vehicle Performance


4 Transport Self Driving cars
Vehicle monitoring system

Medical Image Analysis


5 Healthcare Drug Discovery
Diagnosis Prediction

Consumer Identification
6 E-Commerce Analysis of Reviews
Recommendations

Speech recognition system


7 Artificial Conversational Bots Machine Learning Algorithm
Amazon’s Alexa and Apple’s Siri
Data Collection
Data collection is a method of gathering numeric and alphanumeric data. For
data analysis, you need to perform data collection. It gives a clear idea about
the dataset and adds value to it by providing deeper and clearer analyses
around it. The AI predictions and suggestions by the machine are possible
through data collection.

The data collection is mainly used for record maintenance and other
purposes. The commonly used datasets are:

Banks It holds data for loans, accounts, lockers, payrolls, bank visitors etc.

ATM Machines It holds data related to daily transactions, visitors information, money is withdrawn etc.

Movie Theaters It holds details on movie details, tickets sold online and offline modes, purchase of refres

School School data like students fee collection, results, teachers; salary database etc.

Sources for data collection


There are various sources for data collection found nowadays in the market.
The major kinds of sources for data collection are:

1. Online
2. Offline

Offline
Online Sources
Sources

Open-Sources web portals run by Government Sensors

Reliable private websites such as Kaggle Surveys


Word Organizations Open-source websites Interviews

Observations

Online Sources
The online sources provide the data collection facility by various websites,
portals and apps. Users need to browse the web portal or download the app
and follow the instructions. This method is not that popular as compared to
offline sources right now but in future it become popular.

Offline Sources
The offline sources are more likely effective and useful for data collection.
The offline sources give a clear picture to make a decision. Here are a few
ways for the same.

1. Sensors: They are IoT-based devices which collect data from the
physical world and transform it into digital form. They are connected
through gateways to relay the data into the cloud and server.
2. Surveys: Surveys can be conducted by using different questionnaires. It
is most popular for a large amount of data. It should be handled
carefully. The surveys are less expensive and easy to process. Surveys
are mostly conducted by using forms. These forms can be online or
offline.
3. Interviews: Interviews are the best and most popular way to data
collection. A list of questions is prepared to conduct interviews and
collect data. It is one of the primary collection methods. It is the most
expensive process. It can be also conducted over the phone, through a
web chat interface.
4. Observations: It includes collecting information without asking
questions. It requires researchers, and observers, to add their
judgement to data. It can determine the dynamics of a situation and
cannot be measured through other data collection techniques. It can be
combined with additional information such as video.
The following point should be remembered while accessing data from any
data sources:

1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken from reliable sources as the data collected
from random sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in
the proper training of the AI model.

Types of data
For data science models or projects, generally, data is collected in the form of
tables in different formats:

1. CSV: It is a common and simple file format to store data in tabular


form. It can be opened through any spreadsheet software (MS Excel),
documentation software (MS Word )and any text editor (Notepad).
Everyone contains a record, each record has a number of fields, and
these fields are separated by a comma.
2. Spreadsheet: A spreadsheet contains rows and columns to represent
data in tabular form. Mostly spreadsheet is used to calculate data,
manipulate data, analyse data and maintain data records. Ms excel is
well known and popular spreadsheet software.
3. SQL: It stands for Structured Query Language. It is used to handle the
data stored in DBMS (Database Management Software) System. It
provides basic commands to create, alter, delete and manage
transactions for database management.

Data Access
When the data is collected from different sources, it is required to use for
different purposes. So data access is the key factor. Here in this section of
Data Science Class 10 AI, you are going to learn about data access using
python code.

There are a few python modules and libraries which are very useful for data
access, they are:

1. Numpy: It is one of the most popular packages of python for data


access. It is a Numerical Python a fundamental package for arithmetic
and logical operations on arrays in python. This is a very popular
package to hand numeric data. It has various functions. methods and
properties to work with numbers. It also works for the collection of
homogenous data such as numbers, characters, booleans etc.
2. Pandas: Follow the below-given links for pandas
o Introduction to pandas and series
o Introuction to datraframe
o Data access from dataframe using iteration
o Select access data from dataframe
o Delete rows/columns from dataframe
o DataFrame functions
3. Matplotlib:
o Introduction to Data Visualization
o Creating Histogram

You might also like