0% found this document useful (0 votes)
36 views4 pages

What Is Data Science

Data science uses machine learning algorithms to analyze vast amounts of data from different sources to find patterns and make business decisions. The data science lifecycle involves 5 stages: data acquisition, data preparation, data examination, data analysis, and communicating results. Key prerequisites for data science include machine learning, modeling, statistics, programming (especially Python and R), and understanding databases. Python is well-suited for machine learning due to its simplicity, extensive libraries for tasks like deep learning, platform independence, and large supportive community.

Uploaded by

chandana kiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

What Is Data Science

Data science uses machine learning algorithms to analyze vast amounts of data from different sources to find patterns and make business decisions. The data science lifecycle involves 5 stages: data acquisition, data preparation, data examination, data analysis, and communicating results. Key prerequisites for data science include machine learning, modeling, statistics, programming (especially Python and R), and understanding databases. Python is well-suited for machine learning due to its simplicity, extensive libraries for tasks like deep learning, platform independence, and large supportive community.

Uploaded by

chandana kiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

STUDY MATERIAL

DATA SCIENCE AND MACHINE LEARNING

What Is Data Science?

Data science is the domain of study that deals with vast volumes of data using modern tools and
techniques to find unseen patterns, derive meaningful information, and make business decisions.
Data science uses complex machine learning algorithms to build predictive models.

The data used for analysis can come from many different sources and presented in various
formats.

The Data Science Lifecycle

Now that you know what is data science, next up let us focus on the data science lifecycle. Data
science’s lifecycle consists of five distinct stages, each with its own tasks:

1. Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage involves
gathering raw structured and unstructured data.

2. Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data
Architecture. This stage covers taking the raw data and putting it in a form that can be used.

3. Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization. Data


scientists take the prepared data and examine its patterns, ranges, and biases to determine how
useful it will be in predictive analysis.

4. Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining,


Qualitative Analysis. Here is the real meat of the lifecycle. This stage involves performing the
various analyses on the data.

5. Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision Making.


In this final step, analysts prepare the analyses in easily readable forms such as charts, graphs,
and reports.
Prerequisites for Data Science

Here are some of the technical concepts you should know about before starting to learn what is
data science.

1. Machine Learning

Machine learning is the backbone of data science. Data Scientists need to have a solid grasp of
ML in addition to basic knowledge of statistics.

2. Modeling

Mathematical models enable you to make quick calculations and predictions based on what you
already know about the data. Modeling is also a part of Machine Learning and involves
identifying which algorithm is the most suitable to solve a given problem and how to train these
models.

3. Statistics

Statistics are at the core of data science. A sturdy handle on statistics can help you extract more
intelligence and obtain more meaningful results.

4. Programming

Some level of programming is required to execute a successful data science project. The most
common programming languages are Python, and R. Python is especially popular because it’s
easy to learn, and it supports multiple libraries for data science and ML.

5. Databases

A capable data scientist needs to understand how databases work, how to manage them, and how
to extract data from them.

Python has a lot of benefits among the other languages for machine learning. For example,
Python’s

 easy to learn
 100% compatible
 code is clear
 fast in development
 libraries are extensive
 object-oriented
 open-source and free
 high-level language
 data structure is built-in

4 reasons why Python is the best language for Machine Learning

1. Simplicity and consistency


AI algorithms and machine learning models are complex predictive technologies that Python can
simplify. How? With its clear code, and lots of machine learning-specific libraries, possibility to
shift focus from the language towards algorithms. Also, it is quite easy to learn, consistent, and
intuitive. That’s why Python receives 3rd place as the most popular technology. 48.24% of
developers gave their votes for this language.

2. Variety of libraries and frameworks


There is a vast database of libraries and frameworks that Python uses for machine learning
purposes. For example,

 NumPy works with arrays, in some parts of linear algebra, and different matrices.
 Keras, which is a deep learning API running on Tensorflow to make it possible to experiment
fast.
 Tensorflow – a free open source library for both ML and AI that focuses on training and deep
neural networks.
 Matplotlib is a library that allows the creation of visualizations (static, animated, interactive) in
Python.
 Seaborn – a data visualization library based on Python, which gives an opportunity to draw
graphics (statistics), which are attractive and of high quality.
 PyTorch is an open-source ML library used to build computer vision and natural language
processing applications.

3. Platform independence
Software solutions developed with Python can be built and also can run on multiple operating
system platforms. For instance, Linux, Windows, Mac, Solaris, and more. This makes python
programming machine learning a lot more convenient. That’s why developers enjoy Python in
the process of developing ML apps.

4. Great community
Like there are communities of JavaScript lovers, there is also one belonging to Python. And, it’s
huge. You can have access to almost anything you need there taking development into
consideration. And, also, when you ask something there, you will always get support and
answers.

You might also like