0% found this document useful (0 votes)
5 views14 pages

Chiru Internship

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views14 pages

Chiru Internship

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

DATA SCIENCE

Presented By:
K.Chiru Sai
23555A4404
CSE(DS)
GIER
Lecture Details:
DATA SCIENCE
Internship using skilldzire platform
Contents
 Introduction to Data Science
 Python
 Numpy
 Pandas
 Matplotlib
 SQL
 Machine Learning
 Linear Regression
 Logistic Regression
Data Science

Data science is an interdisciplinary field that


employs scientific methods, procedures, algorithms, and
systems to derive knowledge and insights from structured
and unstructured information. It combines subject expertise,
programming abilities, and mathematical and statistical
understanding to interpret data for decision-making .
Python
 Python is a high-level, object-oriented
programming language that is interpreted
and has dynamic semantics.
 Python's concise, easy-to-learn syntax
emphasises readability, which saves
programme maintenance costs.
 Python has modules and packages, which
promote programme modularity and code
reuse.
Python

Python is used in data science to analyse


data, generate statistical models, develop machine
learning algorithms, visualise data for insights, and
effectively manage enormous datasets. It's adaptable,
simple to learn, and works well with other tools,
making it suitable for everything from data cleansing
to complex analytics.
NumPy
NumPy, or Numerical Python, is
a core Python package for manipulating and
computing huge arrays and matrices. It
provides a strong interface for array-oriented
computing, allowing for operations such as
mathematical, logical, shape manipulation,
sorting, choosing, I/O operations, discrete
Fourier transforms, basic linear algebra, basic
statistical operations, random simulation, and
many others.
Pandas

 Pandas is an open-source Python library that offers

high-performance, user-friendly data structures and

data analysis capabilities.

 Pandas excels in handling tabular data, with

extensive indexing, slicing, merging, reshaping, and

aggregation tools.

 Pandas interfaces smoothly with other libraries like

as NumPy, which improves its numerical computing

and data handling capabilities.


Matplotlib
Matplotlib is a sophisticated
Python charting toolkit that produces high-
quality graphs, figures, and plots. It
provides a diverse set of charting tools for
visualizing data, ranging from simple line
charts to complicated heat maps and 3D
plots. Matplotlib offers fine-grained control
over plot aesthetics and supports a variety of
output formats.
Structured Query Language (SQL)

 SQL (Structured Query Language) is the standard language for managing and manipulating data in
relational database management systems (RDBMS).
 It is declarative, focusing on specifying what data to retrieve or manipulate rather than the specifics
of how to do it.
 Key commands include SELECT for retrieving data, INSERT for adding new records, UPDATE for
modifying existing records, and DELETE for removing records.
Structured Query Language (SQL)

 SQL supports various data types like INTEGER, VARCHAR, and DATE, and enables powerful
operations such as joins to combine data from different tables based on related columns,
aggregations for calculating values across rows, and constraints to ensure data integrity.
 Adhering to best practices like using meaningful names and formatting queries for readability
enhances its effectiveness in database management and application development.
Machine Learning
• Machine learning is a branch of
artificial intelligence that includes creating
algorithms that can learn from data and make
predictions or judgments based on that
information. It focuses on developing models
that can learn and grow on their own, rather than
relying on explicit instructions. Machine
learning applications range from predictive
analytics to natural language processing,
computer vision, and autonomous systems.
Regression
 Linear regression is a statistical approach for
determining the connection between a dependent
variable and one or more independent variables. It
aids in predicting how the dependent variable will
change when the independent factors shift.
 Multiple linear regression is a kind of linear
regression that uses multiple independent
variables to predict a dependent variable. It
investigates how a dependent variable changes
when two or more independent variables are
modified concurrently.
Logistic Regression
 Logistic regression is a statistical model that
predicts the likelihood of a binary result (often
coded as 0 or 1) using one or more predictor
variables.
 Logistic regression is frequently utilized in
many sectors, including healthcare (forecasting
illness presence), marketing (predicting client
behavior), and finance (predicting loan defaults),
where understanding and predicting binary
outcomes is crucial.
THANK YOU

You might also like