FDS CH1
FDS CH1
Data sciece :
- A data science is a collection of techniques to extract values from data.
- Data science is a deep study (analysis) of the massive amount of data, which involves extracting
meaningful insights from raw, structured, and unstructured data.
- Data Science (DS) refers to an area of work concerned with the collection, preparation, analysis,
visualization, management and preservation of large collections of information
- the aim of data science is to discover and extract actionable knowledge from the data .
- Data science helps to solve complex problems using analytical approach.
Primary data : primary data is a data that never collected before.and can be gathered in verity of
ways such as. Observation , conducting interviews,and so on.
Secondary data : is data that is already gathered and can be accessed and used by other users
easily. Secondary data can be from studies, government reports, newspapers, journals, books and
M
also from many popular dedicated websites.
Advantages of Data Science :
- Data science helps to extract meaningful information from raw data ..
- Data science improves the quality of data.
- Data science improves quality of services and products
r.
Disadvantages of Data Science:
- Data Privacy – Expensive - Difficult to Selection of Tools.
The 3 V’s (Volume, Velocity, Variety) :
- Data science has three specific areas Volume, Variety and Velocity (known as 3V’s).
- Volume refers to the increasing size of data, velocity refers to the spped of data, and variety refers
to the diverse types of data that are available.
R
1. Velocity: The speed at which data is accumulated.
2. Volume: The size and scope of the data.
3. Variety: The massive array of data and types (structured and unstructured)
APPLICATIONS OF DATA SCIENCE
oh
Image Recognition and Speech Recognition : Image and speech recognition is a prominent
application of data science When we upload an image on any social media, it automatic
tagging suggestion.
Gaming World: In the gaming world, the use of data science is increasing user experience.
Internet Search : perience. 3. Internet Search: When we want to search for something on
it
the internet, then we use different types of search engines such as Google, Yahoo, Bing,
Ask, etc. ll these search engines use the data science technology to make the search
experience better .
Transport: Transport industries also using data science technology to create selfdriving
cars..
Healthcare: In the healthcare sector, data science is providing lots of benefits. Data science
is being used for tumor detection, drug discovery, medical image analysis, virtual medical
bots, etc.
Recommendation Systems: Most of the companies, Amazon, Netflix, Google Play, etc., are
using data science technology for making a better user experience with personalized
recommendations. Such as, when we search for something on Amazon, and we started
getting suggestions for similar products, so this is because of data science technology.
DATA SCIENCE LIFE CYCLE :
1 : Setting Goal :
- The entire cycle revolves around the business or
research goal.
- in stem 1 of data science life cycle Only we can
set the specific goal with the business objective
after proper understanding.
2 : Data Understanding :
- Data understanding involves the collection of all the
M
available data.
- We need to understand what data is present and what
data could be used for given problem.
3 : Data Preparation :
- The data preparation step includes selecting the relevant data, integrating the data, cleaning.
r.
- This step is used for constructing new data derive.
- Data preparation is time consuming life cycle in the entire data science.
4 : Exploratory Data Analysis :
- This step involves getting some idea about the solution and factors before building the actual
model.
R
5 : Data Modeling :
• Data modeling is the heart of data analysis
• Data modeling step helps to choosing the type of model, whether the problem is a
classification problem or any other type of problem .
oh
- We need to make sure there is a correct balance between performance and
generalizability
6 : Model Evaluation:
- in this step, the model is evaluated for checking if it is ready to be deployed.
- We also need to make sure that the model conforms to reality
it
** DATA SCIENTIST’s TOOLBOX ;
1. Python Programming:
- choosing the right programming language for Data Science is of importance.
- Python offers various libraries for Data Science operations.
- python programming language is an open-source tool and falls under object-
oriented scripting language.
- Python programming language is popular for the implementation of data
preprocessing, statistical analysis, machine learning and deep learning, which are the
core tasks in any data science project.
- Python programming language is run on any platforms ssuch as UNIX, Windows,
and Mac os.
2. R Programming:
- R programming is a popular language used in the Data Science it provides a software
environment for statistical analysis.
- R programming is also an open-source tool.
- R programming is a can run on any platform such as UNIX, Windows, and Mac operating
systems.
- R programming also has a rich collection of libraries (more than 11,556) that can be easily
installed as per requirements.
**Structured Data :
- Structured data is data that depends on a data model
- Structured data is comprised of clearly defined data types
- Data may be human- or machine-generated as long as the data is created within an RDBMS structure.
**Unstructured Data :
M
- Unstructured data is a data that is not organized in a pre-defined manner.
- Unstructured data is data that does not fit into a data model because the content is specific.
- Unstructured data has internal structure but is not structured via pre-defined data models.
- the unstructured data may be textual or non-textual, human- or machine-generated.
- Natural language is a special type of unstructured data.
**Typical Human-generated Unstructured Data:
r.
** Typical Machine-generated Unstructured Data: