0% found this document useful (0 votes)
8 views

Week 1 Intro

Data science involves extracting insights from structured and unstructured data through statistical, mathematical and computational analysis. It includes processes like data collection, cleaning, analysis and interpretation to inform decisions and solve complex problems. Python is widely used in data science due to powerful libraries for data manipulation, analysis, machine learning and deep learning.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Week 1 Intro

Data science involves extracting insights from structured and unstructured data through statistical, mathematical and computational analysis. It includes processes like data collection, cleaning, analysis and interpretation to inform decisions and solve complex problems. Python is widely used in data science due to powerful libraries for data manipulation, analysis, machine learning and deep learning.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Name of the Instructor

Habibullah Nangraj
Copyright © by Institute Of Mathematics And Computer Science all rights reserved.
Data Science
Data science is an interdisciplinary field that involves
extracting meaningful insights and knowledge from
structured and unstructured data through the
application of statistical, mathematical, and
computational techniques. It encompasses various
processes, including data collection, cleaning,
analysis, and interpretation, to inform decision-
making and solve complex problems across diverse
domains.

Copyright © by Institute Of Mathematics And Computer Science all rights reserved.


Structured Data Unstructured Data
Structured data has a well-defined Unstructured data is heterogenous.
structure or adheres to a specified It comes from a broad range of
data model, can be stored in well- sources and has a variety of business
defined schemas such as databases, intelligence and analytics
and in many cases, can be applications. Analyzing unstructured
represented in multiple tables with data often requires artificial
rows and columns. intelligence to gain insights from it

Anything with an electronic footprint, Auto/Manual. Older records require


conversion to electronic formats for effective mining and processing.
Copyright © by Institute Of Mathematics And Computer Science all rights reserved.
Understanding Data
Comma-separated values Delimited text files where the delimiter is a comma. Used to store structured data.
File Formats
(CSVs)
Text files are used to store data where each line or row has values separated by a
delimiter. A delimiter is a sequence of one or more characters between values.
Delimited text file formats Common delimiters include comma, tab, colon, vertical bar, and space. File Formats

Designed to store and manage unstructured data and provide analysis tools for
NoSQL databases examining this type of data. Types of Data

Online Transaction Systems that focus on handling business transactions and storing structured data.
Types of Data
Processing (OLTP) Systems
Designed to store structured data with well-defined schemas and support standard
Relational databases data analysis methods and tools. Types of Data

Devices such as Global Positioning Systems (GPS) and Radio Frequency Identification
Sensors (RFID) tags generate structured data. Types of Data

Software applications like Excel and Google Spreadsheets are used for organizing and
Spreadsheets analyzing structured data. Types of Data

Databases that use Structured Query Language (SQL) for defining, manipulating, and
SQL Databases querying data in structured formats. Types of Data

Copyright © by Institute Of Mathematics And Computer Science all rights reserved.


Copyright © by Institute Of Mathematics And Computer Science all rights reserved.
Copyright © by Institute Of Mathematics And Computer Science all rights reserved.
Copyright © by Institute Of Mathematics And Computer Science all rights reserved.
Copyright © by Institute Of Mathematics And Computer Science all rights reserved.
Python for Data Science
the use of the Python in the field of data science.
Python is widely adopted in data science due to its
many libraries and tools. It provides powerful libraries
like NumPy, pandas, and scikit-learn for data
manipulation and analysis, along with frameworks like
TensorFlow and PyTorch for machine learning and
deep learning. Python's is also used for tasks such as
data cleaning, exploration, modeling, and
visualization in the context of data science.

Copyright © by Institute Of Mathematics And Computer Science all rights reserved.


Machine learning Deep learning
Machine learning is a subset of Deep learning is a subfield of machine
artificial intelligence that focuses on learning that involves the use of
developing algorithms and models artificial neural networks, particularly
that enable computers to learn from deep neural networks, to model and
data and make predictions or solve complex tasks. These networks,
decisions without being explicitly inspired by human brain, consist of
programmed. multiple layers ("deep") that learn
hierarchical representations of data.

Copyright © by Institute Of Mathematics And Computer Science all rights reserved.


Data Cleaning Data Exploration
data cleaning refers to the process of The preliminary analysis and
identifying and rectifying errors, examination of a dataset to gain
inconsistencies, and inaccuracies in insights, identify patterns, and
datasets to ensure data quality. It understand its characteristics. This
involves tasks such as handling missing process involves summarizing key
values, correcting typos, removing statistics, visualizing data
outliers, and addressing other issues distributions, and exploring
that might compromise the integrity of relationships between variables.
the data.

Copyright © by Institute Of Mathematics And Computer Science all rights reserved.


Data Modeling Data Visualization
It involves creating mathematical The Presentation of graphical
representations or structures that representations of data to facilitate
capture relationships, patterns, and understanding, exploration, and
trends within a dataset. It is the communication of patterns, trends,
process of building a model that can be and insights. Through charts, graphs,
used to make predictions, gain insights, and other visual elements, data
or support decision-making. scientists aim to convey complex
information in a more accessible and
understandable format.

Copyright © by Institute Of Mathematics And Computer Science all rights reserved.


Summary
Data Science: Extracting meaningful insights and knowledge from structured and
unstructured data through the application of statistical, mathematical, and
computational techniques.

Data Integration and Transformation: Streamline data pipelines and automate


data processing tasks.

Data Visualization: Provide graphical representation of data and assist with


communicating insights.

Modelling: Enable Building, Deployment, Monitoring and Assessment of Data and


Machine Learning models.
Copyright © by Institute Of Mathematics And Computer Science all rights reserved.
Question and Answer?

Habibullah Nangraj [email protected]

You might also like