0% found this document useful (0 votes)

139 views3 pages

Introduction To Data Science

This document provides an introduction to data science, including definitions of data science, different types of data, sources of data, necessary skills for data scientists, and tools used in data science. It defines data science as using data to find useful insights and make decisions. It discusses structured, unstructured, and semi-structured data and lists common sources like databases, social platforms, and sensors. Necessary skills include programming, math, data analysis, visualization, machine learning, and big data technologies. It outlines a roadmap for learning including statistics, Python libraries, R, distributed computing, and DevOps tools.

Uploaded by

Anish pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views3 pages

Introduction To Data Science

Uploaded by

Anish pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

1.

IntroductionToDataScience

June 3, 2020

What is data science ?

a study about data to find some usefull insights from data to make decisions or solve a problem
what is data ?

whatever we know or we can explain is data, there many forms of data

but in data science we deal digitial stored information in a structured or non-structred ma

Type of Data according to structure

Structured Data - list, excelsheets, sql-database

Unstructured Data - raw data, log, audio, video

Semi-Structred Data - which has some kind of structure but still not fully structured d
json, xml,

from where i will get data ?

source of data generation

databases - sql & no-sql

warehouses - streaming

social platform - APIs

websites (reviews, product information) - webscrapping

government

server (log server) - socket

senosors (machine equipments) - socket

1
surveys - manual or automated task

What skills a person should have to become a data scientist ?

Curiosity - should be able to form relevent questions to answer from data

Communication - should be able to tell a story with the help of data

Programming - should be familer with atlease one programming langauge which has tools to proces

Databases - sholud know how to fetch and store data from and to database

Maths - algebra, calculas, metrices & vectors, statistics, probability

Data Mining & Data Engnieering - pre-processing of data to make data suitable for analysis

Data Analysis - Explore the data to find answers of questions

Data Visualzation - graphs to view data to gain more meaning full information that is hidden in

Machine Learning - Supervise & Unsupervise

Deep Learnign - neural networks

Big Data Technologies - to process huge amount of data

Tools: data science open source or commercial tools used in companies

1. Data Management
2. Data Integration & Transformation
3. Data Visulation
4. Model Deployment
5. Model Monitioring & Assesment
6. Code Acsset Mangement tools
7. Development Enviroments tools
8. Execution Environment
report :
tools used in data science world open source and commercial both ?
[ ]:

Stats

2
[1]: from tqdm import tqdm
from time import sleep

for _ in tqdm(range(900)):
sleep(1)

100%|��|
900/900 [15:01<00:00, 1.00s/it]
Our Road Map
1. Maths : Stats, algebra, calculas, metrices & vectors, probability

2. Data Science using Python

1. Numpy & Scipy Module - to proess metrices and apply statistical knowlege on data

2. Pandas to pre-process and Analyze the data

3. Matplotlib, Seaborn, plotly - data Visulations

4. sklearn, tensorflow, kera, opencv, Machine Learning & deep learning

5. pyspark for distributed computing & Big Data Processing

3. Above using R

4. Big data - hadoop, database

5. AWS, Linux

(Admin) Dev-Ops -> go through it ansible, docker, kubernets, jenkins, openshift, openstack, cep
Data Pipeline Creation
source -> storage -> processing -> modeling -> monitioring -> optimization
report -> 1 hr

stats -> 3 hr
[ ]:

Introduction To Information and Big Data Security
No ratings yet
Introduction To Information and Big Data Security
39 pages
Learnmeabitcoin Technical Keys Address
No ratings yet
Learnmeabitcoin Technical Keys Address
7 pages
Siebel Data Model Reference: November 2011
No ratings yet
Siebel Data Model Reference: November 2011
550 pages
CND v2: Certified Network Defender
100% (1)
CND v2: Certified Network Defender
7 pages
Thank You For Your Purchase: Huawei H12-821 - V1.0 Exam Question & Answers HCIP-Datacom-Core Technology V1.0 Exam
100% (1)
Thank You For Your Purchase: Huawei H12-821 - V1.0 Exam Question & Answers HCIP-Datacom-Core Technology V1.0 Exam
23 pages
Daily Dose of Data Science
No ratings yet
Daily Dose of Data Science
290 pages
OSY 22516chapter1
No ratings yet
OSY 22516chapter1
12 pages
Cubes - Models and Schemas
No ratings yet
Cubes - Models and Schemas
6 pages
Data Warehouses and OLAP
100% (1)
Data Warehouses and OLAP
361 pages
Map Reduce
100% (1)
Map Reduce
33 pages
MDX Tutorial
100% (1)
MDX Tutorial
31 pages
Machine Learning in Python - Course Notes
No ratings yet
Machine Learning in Python - Course Notes
36 pages
Seminar
No ratings yet
Seminar
16 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
22 pages
Apache Cassandra
No ratings yet
Apache Cassandra
3 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Folium Documentation: Release 0.2.0
No ratings yet
Folium Documentation: Release 0.2.0
16 pages
How Does Data Science Works in 2021
No ratings yet
How Does Data Science Works in 2021
9 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Survival Analysis Notes
No ratings yet
Survival Analysis Notes
13 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
Data Scientist - Docx .2
No ratings yet
Data Scientist - Docx .2
10 pages
Creating A Modern Analytics Architecture
No ratings yet
Creating A Modern Analytics Architecture
18 pages
Caching in Spark
No ratings yet
Caching in Spark
51 pages
MDX Introduction and Overview
No ratings yet
MDX Introduction and Overview
3 pages
BigDataAnalytics
100% (1)
BigDataAnalytics
36 pages
Dutcher (2014) What Is Big Data
No ratings yet
Dutcher (2014) What Is Big Data
10 pages
The AI Hierarchy of Needs
No ratings yet
The AI Hierarchy of Needs
8 pages
Big Data and Academic Libraries
No ratings yet
Big Data and Academic Libraries
17 pages
Andrew Treadway - Software Engineering For Data Scientists (MEAP V03) - Manning Publications (2023)
100% (1)
Andrew Treadway - Software Engineering For Data Scientists (MEAP V03) - Manning Publications (2023)
319 pages
Cuestionario Resuelto Big Data
67% (6)
Cuestionario Resuelto Big Data
2 pages
Lecture 9 Overview of Geospatial Programming Languages Block 2
No ratings yet
Lecture 9 Overview of Geospatial Programming Languages Block 2
41 pages
Big Data's Human Component
No ratings yet
Big Data's Human Component
4 pages
Analytixpro - Data Science - Brochure PDF
No ratings yet
Analytixpro - Data Science - Brochure PDF
13 pages
SAS Enterprise Miner Tutorial
No ratings yet
SAS Enterprise Miner Tutorial
2 pages
Honors Physics Equations
No ratings yet
Honors Physics Equations
3 pages
Qlik Replicate More Data AnalyticsReady White Paper US
No ratings yet
Qlik Replicate More Data AnalyticsReady White Paper US
14 pages
Toronto Data Online Curriculum
No ratings yet
Toronto Data Online Curriculum
11 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Optimization of Business Processes
50% (2)
Optimization of Business Processes
242 pages
Developing Grit
No ratings yet
Developing Grit
8 pages
Pentaho 3.2 Data Integration Beginner's Guide
From Everand
Pentaho 3.2 Data Integration Beginner's Guide
Maria Carina Roldan
No ratings yet
Gottlob Sallinger Slides
No ratings yet
Gottlob Sallinger Slides
197 pages
The Feature Store and The Semantic Layer
No ratings yet
The Feature Store and The Semantic Layer
20 pages
SVM in Matlab
100% (1)
SVM in Matlab
17 pages
Big Data Hadoop in Health Care
No ratings yet
Big Data Hadoop in Health Care
51 pages
Data Science Links
No ratings yet
Data Science Links
1 page
Why Data Preprocessing?: Incomplete
No ratings yet
Why Data Preprocessing?: Incomplete
17 pages
Connecting Data Driving Productivity
No ratings yet
Connecting Data Driving Productivity
64 pages
Data Science With R Workflow: Click The Links For Documentation
No ratings yet
Data Science With R Workflow: Click The Links For Documentation
3 pages
Databricks - Data Intelligence Platform For Advanced Data Architecture
No ratings yet
Databricks - Data Intelligence Platform For Advanced Data Architecture
5 pages
7712-Artificial Intelligence and Deep Learning
No ratings yet
7712-Artificial Intelligence and Deep Learning
228 pages
6 XG Boost - Jupyter Notebook
100% (1)
6 XG Boost - Jupyter Notebook
3 pages
Introduction To Data Analysis and Decision Making
No ratings yet
Introduction To Data Analysis and Decision Making
11 pages
Recurrent Neural Networks: Anahita Zarei, PH.D
No ratings yet
Recurrent Neural Networks: Anahita Zarei, PH.D
37 pages
Minor Project Report By-Dhruv Rai
No ratings yet
Minor Project Report By-Dhruv Rai
56 pages
Data Wrangling
No ratings yet
Data Wrangling
24 pages
ISP500 Topic 3 Data and Knowledge Management - ch5
No ratings yet
ISP500 Topic 3 Data and Knowledge Management - ch5
23 pages
Snomed CT
No ratings yet
Snomed CT
56 pages
Resume-Shwetha Seetharam
No ratings yet
Resume-Shwetha Seetharam
1 page
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
From Everand
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
Fouad Sabry
No ratings yet
Learning Apache Cassandra - Second Edition
From Everand
Learning Apache Cassandra - Second Edition
Sandeep Yarabarla
No ratings yet
The TOGAF® Business Architecture Foundation Study Guide: Preparation for the TOGAF Business Architecture Foundation Examination
From Everand
The TOGAF® Business Architecture Foundation Study Guide: Preparation for the TOGAF Business Architecture Foundation Examination
Andrew Josey
No ratings yet
GETTING STARTED WITH SQL: Exercises with PhpMyAdmin and MySQL
From Everand
GETTING STARTED WITH SQL: Exercises with PhpMyAdmin and MySQL
Remy Lentzner
No ratings yet
Unit5 ITPM
No ratings yet
Unit5 ITPM
119 pages
The Genesis of Java / History of Java: Unit I - Introduction To Java
No ratings yet
The Genesis of Java / History of Java: Unit I - Introduction To Java
24 pages
ARM64 Cheat Sheet
No ratings yet
ARM64 Cheat Sheet
1 page
Core Java Questions
No ratings yet
Core Java Questions
28 pages
BSAIS-ISM 323 Information Sheet 4 PDF
No ratings yet
BSAIS-ISM 323 Information Sheet 4 PDF
17 pages
Red-Teaming Active Directory Lab #3 (ELS - CORP) (Attack Path 2)
No ratings yet
Red-Teaming Active Directory Lab #3 (ELS - CORP) (Attack Path 2)
33 pages
Software Innovation Patenting in India - InvnTree
No ratings yet
Software Innovation Patenting in India - InvnTree
2 pages
Btech Electronics Eng
No ratings yet
Btech Electronics Eng
26 pages
Python L5 While Loops
No ratings yet
Python L5 While Loops
12 pages
Hunting For Modern AD (Certification Services) Attacks
No ratings yet
Hunting For Modern AD (Certification Services) Attacks
75 pages
En D6T Catalog
No ratings yet
En D6T Catalog
20 pages
Iot Based Smart Irrigation
No ratings yet
Iot Based Smart Irrigation
4 pages
Gantt Chart Output
No ratings yet
Gantt Chart Output
2 pages
Email Security Administrators Guide
No ratings yet
Email Security Administrators Guide
359 pages
CL210 Rhosp16
No ratings yet
CL210 Rhosp16
6 pages
Ti Aws Architect Associate
No ratings yet
Ti Aws Architect Associate
15 pages
Ky 002
No ratings yet
Ky 002
3 pages
VMWARE Vsphere 4.1 Training
No ratings yet
VMWARE Vsphere 4.1 Training
2 pages
ProxySG-7391 Release Notes
No ratings yet
ProxySG-7391 Release Notes
184 pages
Mellanox sn2010
No ratings yet
Mellanox sn2010
12 pages
Design of A Low Voltage Low Drop Out LDO Voltage Cmos Regulator
No ratings yet
Design of A Low Voltage Low Drop Out LDO Voltage Cmos Regulator
6 pages
McClim User's Manual - A GUI Framework For - Unknown
No ratings yet
McClim User's Manual - A GUI Framework For - Unknown
88 pages
Os Case Study
No ratings yet
Os Case Study
9 pages
Information and Randomness-An Algorithmic Perspective
0% (1)
Information and Randomness-An Algorithmic Perspective
487 pages
Oca Java Se 8 Exam Chapter 2 Operators Statements
No ratings yet
Oca Java Se 8 Exam Chapter 2 Operators Statements
63 pages
Computer Science and Engineering (Artificial Intelligence and Machine Learining)
No ratings yet
Computer Science and Engineering (Artificial Intelligence and Machine Learining)
12 pages

Introduction To Data Science

Uploaded by

Introduction To Data Science

Uploaded by

1.

What is data science ?

whatever we know or we can explain is data, there many forms of data

but in data science we deal digitial stored information in a structured or non-structred ma

Type of Data according to structure

Structured Data - list, excelsheets, sql-database

Unstructured Data - raw data, log, audio, video

from where i will get data ?

source of data generation

databases - sql & no-sql

social platform - APIs

websites (reviews, product information) - webscrapping

server (log server) - socket

senosors (machine equipments) - socket

What skills a person should have to become a data scientist ?

Communication - should be able to tell a story with the help of data

Maths - algebra, calculas, metrices & vectors, statistics, probability

Data Analysis - Explore the data to find answers of questions

Machine Learning - Supervise & Unsupervise

Deep Learnign - neural networks

Big Data Technologies - to process huge amount of data

Tools: data science open source or commercial tools used in companies

2. Data Science using Python

2. Pandas to pre-process and Analyze the data

3. Matplotlib, Seaborn, plotly - data Visulations

4. sklearn, tensorflow, kera, opencv, Machine Learning & deep learning

5. pyspark for distributed computing & Big Data Processing

4. Big data - hadoop, database

You might also like