Introduction To Data Science
Introduction To Data Science
Data Science is one of the important concepts to provide some features like
statistics, data analysis, machine learning and deep learning. Data science
helps to understand and analyse the actual scenario and help to take fruitful
decisions.
Data Science is just not a single field but it uses concepts and principles of
Mathematics, Statistics, Computer Science and Information Science. It is also
capable to discover hidden patterns from the raw data. It can also use for
predictions. Observe the following picture which differentiates the points
between data analysis and data science.
In addition to this, day by day the number of internet users increasing day by
day. So it increases the use of unstructured data. These unstructured data
collected by the various organizations through mobile apps, websites and
other platforms can be used to serve the specific requirements of the
customer and users. This will increase the demand for data science.
Applications of data science
S.No Field Application Area
Customer Services
1 Banking Risk Modelling
Fraud Detection
Decision-Making
2 Finance Risk Analytics
Algorithm Trading
Maintainance Scheduling
Predicting Problems
3 Manufacturing
Detecting Anomalies
Automated Processes
Consumer Identification
6 E-Commerce Analysis of Reviews
Recommendations
The data collection is mainly used for record maintenance and other
purposes. The commonly used datasets are:
Banks It holds data for loans, accounts, lockers, payrolls, bank visitors etc.
ATM Machines It holds data related to daily transactions, visitors information, money is withdrawn etc.
Movie Theaters It holds details on movie details, tickets sold online and offline modes, purchase of refres
School School data like students fee collection, results, teachers; salary database etc.
1. Online
2. Offline
Offline
Online Sources
Sources
Observations
Online Sources
The online sources provide the data collection facility by various websites,
portals and apps. Users need to browse the web portal or download the app
and follow the instructions. This method is not that popular as compared to
offline sources right now but in future it become popular.
Offline Sources
The offline sources are more likely effective and useful for data collection.
The offline sources give a clear picture to make a decision. Here are a few
ways for the same.
1. Sensors: They are IoT-based devices which collect data from the
physical world and transform it into digital form. They are connected
through gateways to relay the data into the cloud and server.
2. Surveys: Surveys can be conducted by using different questionnaires. It
is most popular for a large amount of data. It should be handled
carefully. The surveys are less expensive and easy to process. Surveys
are mostly conducted by using forms. These forms can be online or
offline.
3. Interviews: Interviews are the best and most popular way to data
collection. A list of questions is prepared to conduct interviews and
collect data. It is one of the primary collection methods. It is the most
expensive process. It can be also conducted over the phone, through a
web chat interface.
4. Observations: It includes collecting information without asking
questions. It requires researchers, and observers, to add their
judgement to data. It can determine the dynamics of a situation and
cannot be measured through other data collection techniques. It can be
combined with additional information such as video.
The following point should be remembered while accessing data from any
data sources:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken from reliable sources as the data collected
from random sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in
the proper training of the AI model.
Types of data
For data science models or projects, generally, data is collected in the form of
tables in different formats:
Data Access
When the data is collected from different sources, it is required to use for
different purposes. So data access is the key factor. Here in this section of
Data Science Class 10 AI, you are going to learn about data access using
python code.
There are a few python modules and libraries which are very useful for data
access, they are: