0% found this document useful (0 votes)

17 views

data-science-unit-1

Uploaded by

rudraghankute07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

data-science-unit-1

Uploaded by

rudraghankute07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT-1

INTRODUCTION TO CORE CONCEPTS AND TECHNOLOGY:

1.1 Introduction
1.2 Terminology
1.3 Data science process
1.4 Data science toolkit
1.5 Types of Data
1.6 Example Applications
1.7 Mathematical Foundations for Data Science
INTRODUCTION TO CORE CONCEPTS AND TECHNOLOGY
What is Data Science?

● Data science is a concept to bring together ideas, data examination, Machine Learning, and their
related strategies to comprehend and dissect genuine phenomena with data.
● It is an extension of data analysis fields such as data mining, statistics, and predictive analysis.
● It is a huge field that uses a lot of methods and concepts that belong to other fields like
information science, statistics, mathematics, and computer science.
● Some of the techniques utilized in Data Science encompass machine learning, visualization,
pattern recognition, probability modeling data, data engineering, signal processing, etc.

Importance of Data Science

Data science is important because it enables organizations to make informed decisions based on data and
evidence, rather than intuition and guesswork. Here are some of the key reasons why it is important:
● Improved decision-making: DS provides organizations with a systematic way to analyze data and
make informed decisions. By using data to inform decision-making, organizations can make
better, more informed decisions that drive growth and success.
● Increased efficiency: DS can be used to optimize operations, reduce waste, and improve
efficiency. By using data to identify inefficiencies and areas for improvement, organizations can
streamline their operations and increase their competitiveness.
● Better customer experiences: DS can be used to analyze customer data, such as purchase history
and behavior, to personalize marketing campaigns and improve customer experiences. This can
help organizations increase customer satisfaction and loyalty.
● Innovation: DS can be used to identify new opportunities for growth and innovation. By
analyzing data, organizations can identify new market trends, customer needs, and areas for
innovation, and develop new products and services to meet these needs.
● Improved risk management: DS can be used to analyze risk and uncertainty, such as fraud and
cyber threats, to improve risk management and minimize potential losses.
● A better understanding of complex systems: DS can be used to analyze complex systems, such as
the human body, the global economy, or the environment, to gain a deeper understanding of these
systems and identify ways to improve them.
● Improved decision-making in critical domains: In critical domains such as healthcare, criminal
justice, and finance, data science can help in decision-making by analyzing relevant data to
produce evidence-based outcomes, reducing bias, and improving fairness.

Types of Analysis

Analysis of data is a vital part of running a successful business. When data is used effectively, it leads to
better understanding of a business’s previous performance and better decision-making for its future
activities. There are many ways that data can be utilized, at all levels of a company’s operations.

There are four types of data analysis that are in use across all industries. While we separate these into
categories, they are all linked together and build upon each other. As you begin moving from the simplest
type of analytics to more complex, the degree of difficulty and resources required increases. At the same
time, the level of added insight and value also increases.

Four Types of Data Analysis

The four types of data analysis are:

● Descriptive Analysis
● Diagnostic Analysis
● Predictive Analysis
● Prescriptive Analysis
Below, we will introduce each type and give examples of how they are utilized in business.

Descriptive Analysis
The first type of data analysis is descriptive analysis. It is at the foundation of all data insight. It is the
simplest and most common use of data in business today. Descriptive analysis answers the “what
happened” by summarizing past data, usually in the form of dashboards.

The biggest use of descriptive analysis in business is to track Key Performance Indicators (KPIs). KPIs
describe how a business is performing based on chosen benchmarks.
Business applications of descriptive analysis include:
● KPI dashboards
● Monthly revenue reports
● Sales leads overview
Diagnostic Analysis
Diagnostic analysis takes the insights found from descriptive analytics and drills down to find the causes
of those outcomes. Organizations make use of this type of analytics as it creates more connections
between data and identifies patterns of behavior.

A critical aspect of diagnostic analysis is creating detailed information. When new problems arise, it is
possible you have already collected certain data pertaining to the issue. By already having the data at your
disposal, it ends having to repeat work and makes all problems interconnected.
Business applications of diagnostic analysis include:

● A freight company investigating the cause of slow shipments in a certain region

● A SaaS company drilling down to determine which marketing activities increased trials

Predictive Analysis
This type of analysis is another step up from the descriptive and diagnostic analyses. Predictive analysis
uses the data we have summarized to make logical predictions of the outcomes of events. This analysis
relies on statistical modeling, which requires added technology and manpower to forecast. It is also
important to understand that forecasting is only an estimate; the accuracy of predictions relies on quality
and detailed data.

While descriptive and diagnostic analysis are common practices in business, predictive analysis is where
many organizations begin show signs of difficulty. Some companies do not have the manpower to
implement predictive analysis in every place they desire. Others are not yet willing to invest in analysis
teams across every department or not prepared to educate current teams.

Business applications of predictive analysis include:

● Risk Assessment
● Sales Forecasting
● Using customer segmentation to determine which leads have the best chance of converting
Predictive analytics in customer success teams

Prescriptive Analysis

The final type of data analysis is the most sought after, but few organizations are truly equipped to
perform it. Prescriptive analysis is the frontier of data analysis, combining the insight from all previous
analyses to determine the course of action to take in a current problem or decision.

Prescriptive analysis utilizes state of the art technology and data practices. It is a huge organizational
commitment and companies must be sure that they are ready and willing to put forth the effort and
resources.
Artificial Intelligence (AI) is a perfect example of prescriptive analytics. AI systems consume a large
amount of data to continuously learn and use this information to make informed decisions. Well-designed
AI systems are capable of communicating these decisions and even putting those decisions into action.
Business processes can be performed and optimized daily without a human doing anything with artificial
intelligence.

Currently, most of the big data-driven companies (Apple, Facebook, Netflix, etc.) are utilizing
prescriptive analytics and AI to improve decision making. For other organizations, the jump to predictive
and prescriptive analytics can be insurmountable. As technology continues to improve and more
professionals are educated in data, we will see more companies entering the data-driven realm.
Definitions

Data Lake:
A data lake is a centralized repository that allows organizations to store vast amounts of raw and
unstructured data, such as text, images, videos, and more. Unlike traditional databases, data lakes do not
enforce a specific structure on the data, making them highly flexible and suitable for big data storage and
analysis.

Data Warehouse:
A data warehouse is a structured and organized database designed for reporting and data analysis. It
consolidates data from various sources, transforms it into a consistent format, and stores it for querying
and reporting. Data warehouses are optimized for complex queries and are essential for business
intelligence and decision support.

Data Mart:
A data mart is a subset of a data warehouse. It contains a specific, often domain-focused, portion of data
that is tailored to the needs of a particular group or department within an organization. Data marts are
created to enhance data accessibility and relevance for a particular business area.

Chinese Walls:
In the context of information security and access control, Chinese walls refer to a mechanism or set of
rules that prevent the exchange of information between different departments or groups within an
organization. The purpose is to maintain confidentiality and prevent conflicts of interest, especially in
financial and legal settings.

CSV (Comma-Separated Values):

CSV is a plain text file format used for tabular data storage and exchange. Data in a CSV file is typically
organized into rows and columns, with each row representing a record and each column representing a
field or attribute. Values are separated by commas (or other delimiters like semicolons).

JSON (JavaScript Object Notation):

JSON is a lightweight, human-readable data interchange format. It is often used to represent structured
data, such as objects and arrays, in a text format. JSON is widely used for data exchange between web
services and applications and is easily parsed by a variety of programming languages.
These definitions provide a basic understanding of these terms within the context of data management,
information security, and data formats.

5 Prepared by Jayashri M.
Life Cycle of Data Science

The data science life cycle is a systematic approach to solving complex problems using data. It consists of
several key stages, including data collection, data storage, data processing, data description, data
modeling, data presentation, and automation. Here's an overview of each stage in the data science life
cycle:
1. Data Collection:
- In this initial stage, data scientists gather relevant data from various sources. This can include
structured data from databases, unstructured data from text, images, and other sources, and even data from
sensors and IoT devices.
- Data collection methods may include web scraping, surveys, data imports, or connecting to APIs to
retrieve data.

2. Data Storage:
- Once the data is collected, it needs to be stored in a suitable environment. This typically involves a
data repository or database.
- Proper data storage ensures data accessibility, security, and scalability. Common database systems
include SQL databases, NoSQL databases, and data lakes.
3. Data Processing:
- Data often requires cleaning, transformation, and preprocessing to make it suitable for analysis. This
stage involves data cleaning, data normalization, handling missing values, and feature engineering.
- Data processing also includes exploratory data analysis (EDA), where data scientists gain insights and
understand the data's characteristics.
4. Data Description:
- In this stage, data scientists describe the data using statistical and visual methods. This involves
summarizing key statistics, creating visualizations, and identifying patterns or anomalies in the data.
- Descriptive statistics and data visualization techniques are commonly used to convey insights.
5. Data Modeling:
- Data modeling is the heart of the data science process. It involves building predictive and analytical
models to solve specific problems.
- Machine learning and statistical modeling techniques are applied to the prepared data to make
predictions, classifications, or to gain deeper insights.

6 Prepared by Jayashri M.
6. Data Presentation:
- The results and insights obtained from data modeling need to be communicated effectively. Data
scientists use various visualization and reporting tools to present their findings.
- Data visualization, dashboards, and reports are created to make the results accessible and
understandable to stakeholders.
7. Automation:
- In many cases, data science solutions are not one-time endeavors but ongoing processes. To make
data-driven decision-making sustainable, automation is often implemented.
- This can involve setting up automated data pipelines, model retraining, and real-time data monitoring.
The data science life cycle is often iterative, as insights gained at any stage may necessitate revisiting
previous stages to refine the process. It's a continuous cycle of collecting, storing, processing, describing,
modeling, presenting, and automating data to solve real-world problems and make data-driven decisions.

Data science toolkit

A data science toolkit typically consists of a combination of software, programming languages, libraries,
and tools that data scientists use to collect, analyze, and visualize data, build machine learning models,
and perform various data-related tasks. Here is a list of some essential components of a data science
toolkit:

1. Programming Languages:
- Python: Widely used for data analysis, machine learning, and data visualization. Libraries like NumPy,
Pandas, Matplotlib, and Scikit-Learn are popular for data science tasks.
- R: Another popular language for data analysis and statistics, known for its extensive set of statistical
packages.

2. Integrated Development Environments (IDEs):

- Jupyter Notebook: An interactive and web-based environment for writing and running code, making it
easy to document and share data analysis.
- RStudio: An IDE specifically designed for R programming.

3. Data Manipulation and Analysis Tools:

- Pandas: A Python library for data manipulation and analysis, providing data structures like
DataFrames.
- SQL: For querying and working with relational databases.
- Excel: Often used for data exploration and basic analysis.

4. Data Visualization Tools:

- Matplotlib: A Python library for creating static, animated, or interactive visualizations.
- Seaborn: A Python library built on top of Matplotlib for creating attractive statistical graphics.
- Tableau, Power BI, or Qlik: Tools for creating interactive and shareable data visualizations.

5. Machine Learning Libraries:

7 Prepared by Jayashri M.
- Scikit-Learn: A Python library with a wide range of machine learning algorithms and tools for model
evaluation and selection.
- TensorFlow and PyTorch: Deep learning libraries for building neural networks.

6. Big Data Processing:

- Apache Hadoop: A framework for distributed storage and processing of big data.
- Apache Spark: A fast and cluster computing system often used for big data processing and machine
learning.

7. Version Control:
- Git*: A widely used version control system for tracking changes in code and collaborating with others.

8. Database Management Systems:

- SQL Databases (e.g., MySQL, PostgreSQL, SQLite): For structured data storage and retrieval.
- NoSQL Databases (e.g., MongoDB, Cassandra): For unstructured or semi-structured data storage.

9. Data Visualization Libraries:

- D3.js: A JavaScript library for creating interactive and customized data visualizations.
- Plotly: A Python library for creating interactive, web-based visualizations.

10. Data Wrangling Tools:

- OpenRefine: An open-source tool for cleaning and transforming messy data.

11. Text Editors and IDEs:

- Tools like Visual Studio Code, Sublime Text, or Atom for coding and scripting.

12. Cloud Services:

- Platforms like AWS, Google Cloud, and Azure for scalable data storage and processing.

13. Collaboration and Documentation Tools:

- Tools like JIRA, Trello, Confluence, and Markdown for project management and documentation.

14. Statistical Software:

- SPSS and SAS: Commercial software used for statistical analysis and data mining.

15. Containerization and Orchestration:

- Docker for containerization and Kubernetes for container orchestration, helpful for reproducibility
and scalability.

16. Notebook Sharing and Publishing:

- Platforms like GitHub, GitLab, and Jupyter Notebook Viewer for sharing and publishing notebooks
and code.

17. Data Collection and Web Scraping:

8 Prepared by Jayashri M.
- Libraries like Beautiful Soup and Scrapy for web scraping and APIs for collecting data.

The specific tools and technologies you use in your data science toolkit can vary based on your projects,
preferences, and the nature of your data analysis tasks. Data scientists often adapt and expand their
toolkits as they gain experience and work on different projects.

Types of Data

data is a collection of raw, unorganized facts that need to be processed. After the data is processed we can
conclude whether it may be used to prove or disprove a hypothesis or a data set.

You have probably noticed that the questions above have different answers. It shows us, that there are
different types of data. So, it is essential to know what type of data you are working with. Having
understood the kind of data, you will be able to effectively interpret and analyze it.

There are two main data types: numerical and categorical or, in other words, quantitative and qualitative.

a) Numerical data

Numerical, or quantitative, data is a type of data that represents numbers rather than natural language
descriptions, so it can only be collected in a numeric form.

Examples of quantitative data include arithmetic operations (addition, subtraction, division, and
multiplication), and ways to measure a person's weight and height.

It is also divided into two subsets: discrete data and continuous data:

Discrete data:

9 Prepared by Jayashri M.
The main feature of this data type is that it is countable, meaning that it can take certain values like
numbers 1,2,3 and so on, and a discrete dataset can be either finite or infinite.
Examples of these types of data are age, the number of children you want to have (the number is a
non-negative integer because you can't have 1.5 or −2 kids), and the number of sugar cubes in the jar. All
of these examples are finite. They can be counted from the beginning to the end, but if you try to count all
the sugar cubes in the world, you will notice that it is countably infinite data, so you cannot possibly
complete the counting as the number of sugar cubes tends to infinity.

Continuous data:
Continuous data is a type of data with uncountable elements. It is represented as a set of intervals on a
number line. Just like discrete data, continuous can also be either finite or infinite.

Examples of continuous data are the measure of weight, height, area, distance, time, etc. This type of data
can be further divided into interval data and ratio data.

Interval data:
Interval data is measured along a scale, in which each point is placed at an equal distance, or interval,
from one another.

Ratio data:
Ratio data is almost the same as the previous type but the main difference is that it has a zero point. For
instance, the zero point temperature can be measured in Kelvin. It is equal to −273.15
degrees Celsius, or −459.67 Fahrenheit.

b) Categorical data
Categorical, or qualitative data, is information divided into groups or categories using labels or names. In
such dataset, each item is placed in a single category depending on its qualities. All categories are
mutually exclusive.

Numbers in this type of data do not have mathematical meaning, i.e. no arithmetical operations can be
performed with numerical variables.

A good example of categorical data is when you are filling out forms for job applications. You may be
asked to specify your level of education. For instance, you are choosing MSc out of all because you fall
under this particular category.

Categorical data is further divided into nominal data and ordinal data.

Nominal data:
Nominal data, also known as naming data, is descriptive and has a function of labeling or naming
variables. Elements of this type of data do not have any order, or numerical value, and cannot be
measured. Nominal data is usually collected via questionnaires or surveys. E.g.: Person's name, eye color,
clothes brand.

10 Prepared by Jayashri M.
Ordinal data:

This type of data represents elements that are ordered, ranked, or used on a rating scale. Generally
speaking, these are categories with an implied order. Though ordinal data can be counted, it cannot be
measured as well as nominal one.

Examples of ordinal data include customer satisfaction rating, Likert scale, and income level.

Example Applications.
Data science can be applied in a wide range of industries and fields, including:
● Healthcare: DS is used to analyze patient data and improve disease diagnosis, treatment, and
patient outcomes.

● Finance: DS is used to analyze financial data, such as stock prices and market trends, to make
informed investment decisions.

● Retail: DS is used to analyze customer data, such as purchase history and behavior, to personalize
marketing campaigns and improve customer experience.

● Transportation: DS is used to optimize routes, reduce fuel consumption, and improve overall
efficiency in the transportation industry.

● Manufacturing: DS is used to optimize production processes, reduce waste, and improve product
quality in the manufacturing industry.

● Energy: DS is used to optimize energy consumption, improve energy efficiency, and develop new
renewable energy sources.

● Marketing: DS is used to analyze customer data, such as demographics and behavior, to inform
targeted marketing campaigns and improve customer acquisition and retention.

● Sports: DS is used to analyze player performance, injury rates, and team strategy to inform
coaching decisions and improve player performance.

● Education: DS is used to analyze student performance data, teacher effectiveness, and school
programs to improve educational outcomes and student achievement.

● Government: DS is used to analyze various types of data, such as crime statistics and economic
indicators, to inform public policy and improve government services

11 Prepared by Jayashri M.
Mathematical Foundations for Data Science
1. Linear Algebra
 Matrices and Vectors: Fundamental building blocks for data representation.
 Matrix Operations: Addition, multiplication, and inversion.
 Eigenvalues and Eigenvectors: Key for understanding data transformations.
 Singular Value Decomposition (SVD): Used in dimensionality reduction techniques like PCA.

2. Probability Theory
 Random Variables: Discrete and continuous variables.
 Probability Distributions: Normal, binomial, Poisson distributions.
 Bayes’ Theorem: Foundation for Bayesian inference.
 Expectation and Variance: Measures of central tendency and dispersion.

3. Statistics
 Descriptive Statistics: Mean, median, mode, standard deviation.
 Inferential Statistics: Hypothesis testing, confidence intervals.
 Regression Analysis: Linear and logistic regression for predictive modeling.
 Correlation and Causation: Understanding relationships between variables.
4. Calculus
 Differentiation: Understanding rates of change, used in optimization.
 Integration: Area under curves, useful in probability and statistics.
 Multivariable Calculus: Partial derivatives, gradients, used in machine learning algorithms.
5. Optimization
 Gradient Descent: Iterative method for finding local minima.
 Convex Optimization: Ensures global minima, used in many machine learning algorithms.
 Constrained Optimization: Techniques like Lagrange multipliers for problems with constraints.
6. Discrete Mathematics
 Graph Theory: Nodes and edges, used in network analysis.
 Combinatorics: Counting methods, useful in probability.
 Boolean Algebra: Logical operations, foundational for computer science.
7. Numerical Methods
 Root Finding Algorithms: Newton-Raphson method.
 Numerical Integration: Trapezoidal and Simpson’s rule.
 Optimization Algorithms: Simplex method, used in linear programming.

Applications in Data Science

 Machine Learning: Algorithms like SVM, neural networks rely heavily on these mathematical
foundations.
 Data Analysis: Techniques like clustering, dimensionality reduction.
 Natural Language Processing (NLP): Understanding and processing human language data.
 Computer Vision: Image processing and analysis.

12 Prepared by Jayashri M.

Ge Host Communication Reference
100% (1)
Ge Host Communication Reference
45 pages
Business Analytics
From Everand
Business Analytics
Hiriyappa .B
5/5 (1)
Lego Star Wars The Complete Saga Manual PDF
No ratings yet
Lego Star Wars The Complete Saga Manual PDF
11 pages
Rethinking Classical Concurrency Patterns
No ratings yet
Rethinking Classical Concurrency Patterns
121 pages
Siemens Scada
No ratings yet
Siemens Scada
30 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
Module 2 - Fund. of Business Analytics
No ratings yet
Module 2 - Fund. of Business Analytics
26 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Data sci notes
No ratings yet
Data sci notes
88 pages
Data Analysis
No ratings yet
Data Analysis
34 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Q
No ratings yet
Q
28 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Overview of Data Analysis
No ratings yet
Overview of Data Analysis
4 pages
Introduction to Data Science and Data Analytics
No ratings yet
Introduction to Data Science and Data Analytics
72 pages
Business Analytics COMPLETE
No ratings yet
Business Analytics COMPLETE
8 pages
1overview of Data Analysis
No ratings yet
1overview of Data Analysis
3 pages
Overview of Data Analysis
No ratings yet
Overview of Data Analysis
11 pages
Complete Notes of BA
100% (1)
Complete Notes of BA
22 pages
Data analytics_1
No ratings yet
Data analytics_1
21 pages
Unit 1 Business Analytics
No ratings yet
Unit 1 Business Analytics
24 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
DA Unit 2
No ratings yet
DA Unit 2
12 pages
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
100% (2)
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
51 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
24 pages
2 Types of Data Analytics
No ratings yet
2 Types of Data Analytics
21 pages
002 - Discover Data Analysis - Overview of Data Analysis
No ratings yet
002 - Discover Data Analysis - Overview of Data Analysis
4 pages
PrE7 Chapter 8 Data Analytics
No ratings yet
PrE7 Chapter 8 Data Analytics
20 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Unit1
No ratings yet
Unit1
21 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
16 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
42 pages
2 Da
100% (1)
2 Da
17 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
3 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Week-1-Lecture
No ratings yet
Week-1-Lecture
26 pages
Business Analytics CH 1
No ratings yet
Business Analytics CH 1
37 pages
Beginners Guide For Business and Science
No ratings yet
Beginners Guide For Business and Science
185 pages
Data Analytics Guide
No ratings yet
Data Analytics Guide
10 pages
Data Analytics 1
No ratings yet
Data Analytics 1
3 pages
BISMA ITC
No ratings yet
BISMA ITC
7 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
3 pages
Business Analytics
No ratings yet
Business Analytics
10 pages
AI PL-300
No ratings yet
AI PL-300
193 pages
ITGY403 Lesson 1
No ratings yet
ITGY403 Lesson 1
16 pages
Chap I
No ratings yet
Chap I
16 pages
Power BI - Notes
No ratings yet
Power BI - Notes
13 pages
Business Analytics 001
No ratings yet
Business Analytics 001
7 pages
Business Analytics Unit 1
No ratings yet
Business Analytics Unit 1
57 pages
What Are Data Analytics
No ratings yet
What Are Data Analytics
19 pages
HubSpots Guide To Data Analytics
No ratings yet
HubSpots Guide To Data Analytics
50 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
Analysing Data: Involves Using Tools and Techniques To Identify Patterns, Trends, and
No ratings yet
Analysing Data: Involves Using Tools and Techniques To Identify Patterns, Trends, and
52 pages
Data Analysis
No ratings yet
Data Analysis
87 pages
unit-1-221226040256-44f48981
No ratings yet
unit-1-221226040256-44f48981
32 pages
2-Fundamentals of DA
No ratings yet
2-Fundamentals of DA
28 pages
1 Introduction to Data Analytics
No ratings yet
1 Introduction to Data Analytics
14 pages
Dataanalyticsunit-1 (2) 104014
No ratings yet
Dataanalyticsunit-1 (2) 104014
51 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
12 pages
Analytics PI Kit PDF
No ratings yet
Analytics PI Kit PDF
38 pages
UNIT I
No ratings yet
UNIT I
27 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Fba M1
No ratings yet
Fba M1
4 pages
Technical Seminar 2 - Copy (1)[2]
No ratings yet
Technical Seminar 2 - Copy (1)[2]
22 pages
Laiye RPA+AI Security Features
No ratings yet
Laiye RPA+AI Security Features
14 pages
Wireshark Lab: 802.11: Approach, 6 Ed., J.F. Kurose and K.W. Ross
No ratings yet
Wireshark Lab: 802.11: Approach, 6 Ed., J.F. Kurose and K.W. Ross
5 pages
LA Web Developer Guide
No ratings yet
LA Web Developer Guide
60 pages
What's New in Electronics
No ratings yet
What's New in Electronics
36 pages
Consent Mode DPO Guide
No ratings yet
Consent Mode DPO Guide
15 pages
MyEmate Teacher User Manual
No ratings yet
MyEmate Teacher User Manual
19 pages
PC Based Robot Controlling Using Wireless Communication With ASK Modulation
No ratings yet
PC Based Robot Controlling Using Wireless Communication With ASK Modulation
74 pages
Introduction To Computer Hardware and Troubleshooting
100% (1)
Introduction To Computer Hardware and Troubleshooting
2 pages
Modeling and Control of Two Axis Gimbal System With Dynamic Unbalance
No ratings yet
Modeling and Control of Two Axis Gimbal System With Dynamic Unbalance
3 pages
FIB Sep 2020 Sector and Grouping
No ratings yet
FIB Sep 2020 Sector and Grouping
5 pages
Darpa Fy14pb PDF
No ratings yet
Darpa Fy14pb PDF
318 pages
Module 5
No ratings yet
Module 5
2 pages
DHI-LS550UDM-EF Datasheet 2022.10
No ratings yet
DHI-LS550UDM-EF Datasheet 2022.10
2 pages
Concepts of database management 10th Edition Lisa Friedrichsen - eBook PDFinstant download
100% (3)
Concepts of database management 10th Edition Lisa Friedrichsen - eBook PDFinstant download
30 pages
Chandan V Nagaraja - Curriculum Vitae
No ratings yet
Chandan V Nagaraja - Curriculum Vitae
5 pages
Digital Voice Recorder
No ratings yet
Digital Voice Recorder
4 pages
Finetuning LLM for vulnerability detection
No ratings yet
Finetuning LLM for vulnerability detection
12 pages
Jasnoor Singh Sahota: Professional Skills
No ratings yet
Jasnoor Singh Sahota: Professional Skills
1 page
QRG DWM 01
No ratings yet
QRG DWM 01
6 pages
Stok Coding
No ratings yet
Stok Coding
16 pages
L - 21ftview SE PDF
No ratings yet
L - 21ftview SE PDF
115 pages
Hillstone E-2000 Series en
No ratings yet
Hillstone E-2000 Series en
4 pages
Opticut V: Panel and Profile Cutting Optimization
No ratings yet
Opticut V: Panel and Profile Cutting Optimization
23 pages
Activities 15 % of EXAM: Material For PRPC Certification
No ratings yet
Activities 15 % of EXAM: Material For PRPC Certification
36 pages
Computer Programming CH 01
No ratings yet
Computer Programming CH 01
46 pages
Digiplex Control Panel V1.2 Reference & Installation Manual
No ratings yet
Digiplex Control Panel V1.2 Reference & Installation Manual
56 pages

data-science-unit-1

Uploaded by

data-science-unit-1

Uploaded by

UNIT-1

INTRODUCTION TO CORE CONCEPTS AND TECHNOLOGY:

Importance of Data Science

Four Types of Data Analysis

The four types of data analysis are:

● A freight company investigating the cause of slow shipments in a certain region

Business applications of predictive analysis include:

CSV (Comma-Separated Values):

JSON (JavaScript Object Notation):

Data science toolkit

2. Integrated Development Environments (IDEs):

3. Data Manipulation and Analysis Tools:

4. Data Visualization Tools:

5. Machine Learning Libraries:

6. Big Data Processing:

8. Database Management Systems:

9. Data Visualization Libraries:

10. Data Wrangling Tools:

11. Text Editors and IDEs:

12. Cloud Services:

13. Collaboration and Documentation Tools:

14. Statistical Software:

15. Containerization and Orchestration:

16. Notebook Sharing and Publishing:

17. Data Collection and Web Scraping:

Applications in Data Science

You might also like