0% found this document useful (0 votes)
30 views4 pages

FDS CH1

Uploaded by

sonuchaure548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

FDS CH1

Uploaded by

sonuchaure548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

FDS CH 1 : Introduction to data science

Data sciece :
- A data science is a collection of techniques to extract values from data.
- Data science is a deep study (analysis) of the massive amount of data, which involves extracting
meaningful insights from raw, structured, and unstructured data.
- Data Science (DS) refers to an area of work concerned with the collection, preparation, analysis,
visualization, management and preservation of large collections of information
- the aim of data science is to discover and extract actionable knowledge from the data .
- Data science helps to solve complex problems using analytical approach.
Primary data : primary data is a data that never collected before.and can be gathered in verity of
ways such as. Observation , conducting interviews,and so on.
Secondary data : is data that is already gathered and can be accessed and used by other users
easily. Secondary data can be from studies, government reports, newspapers, journals, books and
M
also from many popular dedicated websites.
Advantages of Data Science :
- Data science helps to extract meaningful information from raw data ..
- Data science improves the quality of data.
- Data science improves quality of services and products
r.
Disadvantages of Data Science:
- Data Privacy – Expensive - Difficult to Selection of Tools.
The 3 V’s (Volume, Velocity, Variety) :
- Data science has three specific areas Volume, Variety and Velocity (known as 3V’s).
- Volume refers to the increasing size of data, velocity refers to the spped of data, and variety refers
to the diverse types of data that are available.
R
1. Velocity: The speed at which data is accumulated.
2. Volume: The size and scope of the data.
3. Variety: The massive array of data and types (structured and unstructured)
APPLICATIONS OF DATA SCIENCE
oh
Image Recognition and Speech Recognition : Image and speech recognition is a prominent
application of data science When we upload an image on any social media, it automatic
tagging suggestion.
Gaming World: In the gaming world, the use of data science is increasing user experience.
Internet Search : perience. 3. Internet Search: When we want to search for something on
it
the internet, then we use different types of search engines such as Google, Yahoo, Bing,
Ask, etc. ll these search engines use the data science technology to make the search
experience better .
Transport: Transport industries also using data science technology to create selfdriving
cars..
Healthcare: In the healthcare sector, data science is providing lots of benefits. Data science
is being used for tumor detection, drug discovery, medical image analysis, virtual medical
bots, etc.
Recommendation Systems: Most of the companies, Amazon, Netflix, Google Play, etc., are
using data science technology for making a better user experience with personalized
recommendations. Such as, when we search for something on Amazon, and we started
getting suggestions for similar products, so this is because of data science technology.
DATA SCIENCE LIFE CYCLE :

- The lifecycle of the data provides a framework for the best


performances .
- the life cycle of data sciece contain 7 steps.

1 : Setting Goal :
- The entire cycle revolves around the business or
research goal.
- in stem 1 of data science life cycle Only we can
set the specific goal with the business objective
after proper understanding.
2 : Data Understanding :
- Data understanding involves the collection of all the
M
available data.
- We need to understand what data is present and what
data could be used for given problem.
3 : Data Preparation :
- The data preparation step includes selecting the relevant data, integrating the data, cleaning.
r.
- This step is used for constructing new data derive.
- Data preparation is time consuming life cycle in the entire data science.
4 : Exploratory Data Analysis :
- This step involves getting some idea about the solution and factors before building the actual
model.
R
5 : Data Modeling :
• Data modeling is the heart of data analysis
• Data modeling step helps to choosing the type of model, whether the problem is a
classification problem or any other type of problem .
oh
- We need to make sure there is a correct balance between performance and
generalizability
6 : Model Evaluation:
- in this step, the model is evaluated for checking if it is ready to be deployed.
- We also need to make sure that the model conforms to reality
it
** DATA SCIENTIST’s TOOLBOX ;
1. Python Programming:
- choosing the right programming language for Data Science is of importance.
- Python offers various libraries for Data Science operations.
- python programming language is an open-source tool and falls under object-
oriented scripting language.
- Python programming language is popular for the implementation of data
preprocessing, statistical analysis, machine learning and deep learning, which are the
core tasks in any data science project.
- Python programming language is run on any platforms ssuch as UNIX, Windows,
and Mac os.
2. R Programming:
- R programming is a popular language used in the Data Science it provides a software
environment for statistical analysis.
- R programming is also an open-source tool.
- R programming is a can run on any platform such as UNIX, Windows, and Mac operating
systems.
- R programming also has a rich collection of libraries (more than 11,556) that can be easily
installed as per requirements.
**Structured Data :
- Structured data is data that depends on a data model
- Structured data is comprised of clearly defined data types
- Data may be human- or machine-generated as long as the data is created within an RDBMS structure.
**Unstructured Data :
M
- Unstructured data is a data that is not organized in a pre-defined manner.
- Unstructured data is data that does not fit into a data model because the content is specific.
- Unstructured data has internal structure but is not structured via pre-defined data models.
- the unstructured data may be textual or non-textual, human- or machine-generated.
- Natural language is a special type of unstructured data.
**Typical Human-generated Unstructured Data:
r.
** Typical Machine-generated Unstructured Data:

** Problems with Unstructured Dat :


Unstructured Data Keeps Expanding: , Time Consuming: , Not all Unstructured Data is High Quality ,
Data cannot be Analysed with Conventional Systems
R
DATA SOURCES :
Open data source : A data source in data science is the initial location of data.
, Social Media Data Source
DATA FORMATS :
oh
• In data science the data in the form of different sizes and shapes. it can be numerical data, text, audio,
video or other type of data .
• Integers: Integral data types may be of different sizes and may or may not be allowed to contain negative
values.
• Floats : the floot contains decimal point integers ex. 1.11, 0.11, 88.00.
• Text Data: Text data type is known as Strings in Python, strig can contains numbers or charecters. You
it
can write any statement in (“ ”) this double cotation that will emplies strings
• Dense Numerical Arrays : the dense numericle arrays is used For storing large arrays of numbers
• CSV Format : CSV stands for Comma Separated Values which is a text-based file format that store data in
a tabular form, CSV files are the commonly used data format for data science.
• HTML Files : TML Files: • HTML stands for Hyper Text Markup Language. An HTML file is a text file
containing small markup tags., The markup tags tell the Web browser how to display the page An HTML file
must have an .htm or .html file extension , An HTML file can be created using a simple text editor like •
Notepad you can also use different editor .
•JSON Files : JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is not only easy
for humans to read and write, but also easy for machines to generate. JSON is built on two structures ie. A
collection of name–value pairs & An ordered list of values. • When exchanging data between a browser
and a server, the data can be sent only as text.
it
oh
R
r.
M

You might also like