0% found this document useful (0 votes)

8 views18 pages

Computational Data Science - Unit 1

The document provides a comprehensive overview of data science, its importance, lifecycle, and the role of data scientists in various industries. It distinguishes between structured and unstructured data, outlines the prerequisites for data science, and highlights its applications across sectors such as healthcare, finance, and logistics. Additionally, it contrasts data science with business intelligence, emphasizing the advanced analytical capabilities of data science.

Uploaded by

brainx Magic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views18 pages

Computational Data Science - Unit 1

Uploaded by

brainx Magic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

COMPUTATIONAL DATA SCIENCE (ITITE68)

UNIT – 1

What is Data Science, importance of data science, big data and data Science, the current
Scenario, Industry Perspective Types of Data: Structured vs. Unstructured Data,
Quantitative vs. Categorical Data, Big Data vs. Little Data, Data science process, Role
Data Scientist

Data science is an essential part of many industries today, given the massive amounts of data
that are produced, and is one of the most debated topics in IT circles. Its popularity has grown
over the years, and companies have started implementing data science techniques to grow their
business and increase customer satisfaction.

What Is Data Science?

Data science is the domain of study that deals with vast volumes of data using modern tools
and techniques to find unseen patterns, derive meaningful information, and make business
decisions. Data science uses complex machine learning algorithms to build predictive models.

The data used for analysis can come from many different sources and presented in various
formats.

The Data Science Lifecycle

Data science’s lifecycle consists of five distinct stages, each with its own tasks:

1. Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This
stage involves gathering raw structured and unstructured data.

2. Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data
Architecture. This stage covers taking the raw data and putting it in a form that can
be used.

3. Process: Data Mining, Clustering/Classification, Data Modeling, Data

Summarization. Data scientists take the prepared data and examine its patterns,
ranges, and biases to determine how useful it will be in predictive analysis.
4. Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining,
Qualitative Analysis. Here is the real meat of the lifecycle. This stage involves
performing the various analyses on the data.

5. Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision

Making. In this final step, analysts prepare the analyses in easily readable forms such
as charts, graphs, and reports.

Prerequisites for Data Science

Here are some of the technical concepts you should know about before starting to learn what
is data science.

1. Machine Learning

Machine learning is the backbone of data science. Data Scientists need to have a solid grasp of
ML in addition to basic knowledge of statistics.

2. Modeling

Mathematical models enable you to make quick calculations and predictions based on what
you already know about the data. Modeling is also a part of Machine Learning and involves
identifying which algorithm is the most suitable to solve a given problem and how to train these
models.

3. Statistics

Statistics are at the core of data science. A sturdy handle on statistics can help you extract more
intelligence and obtain more meaningful results.

4. Programming

Some level of programming is required to execute a successful data science project. The most
common programming languages are Python, and R. Python is especially popular because it’s
easy to learn, and it supports multiple libraries for data science and ML.
5. Databases

A capable data scientist needs to understand how databases work, how to manage them, and
how to extract data from them.

Who Oversees the Data Science Process?

Business Managers

The business managers are the people in charge of overseeing the data science training method.
Their primary responsibility is to collaborate with the data science team to characterise the
problem and establish an analytical method. A data scientist may oversee the marketing,
finance, or sales department, and report to an executive in charge of the department. Their goal
is to ensure projects are completed on time by collaborating closely with data scientists and IT
managers.

IT Managers

Following them are the IT managers. If the member has been with the organisation for a long
time, the responsibilities will undoubtedly be more important than any others. They are
primarily responsible for developing the infrastructure and architecture to enable data science
activities. Data science teams are constantly monitored and resourced accordingly to ensure
that they operate efficiently and safely. They may also be in charge of creating and maintaining
IT environments for data science teams.

Data Science Managers

The data science managers make up the final section of the tea. They primarily trace and
supervise the working procedures of all data science team members. They also manage and
keep track of the day-to-day activities of the three data science teams. They are team builders
who can blend project planning and monitoring with team growth.
What is a Data Scientist?

Data scientists are among the most recent analytical data professionals who have the technical
ability to handle complicated issues as well as the desire to investigate what questions need to
be answered. They're a mix of mathematicians, computer scientists, and trend forecasters.
They're also in high demand and well-paid because they work in both the business and IT
sectors.

On a daily basis, a data scientist may do the following tasks:

1. Discover patterns and trends in datasets to get insights.

2. Create forecasting algorithms and data models.

3. Improve the quality of data or product offerings by utilising machine learning

techniques.

4. Distribute suggestions to other teams and top management.

5. In data analysis, use data tools such as R, SAS, Python, or SQL.

6. Top the field of data science innovations.

What Does a Data Scientist Do?

You know what is data science, and you must be wondering what exactly is this job role like -
here's the answer. A data scientist analyzes business data to extract meaningful insights. In
other words, a data scientist solves business problems through a series of steps, including:

• Before tackling the data collection and analysis, the data scientist determines the
problem by asking the right questions and gaining understanding.

• The data scientist then determines the correct set of variables and data sets.

• The data scientist gathers structured and unstructured data from many disparate
sources—enterprise data, public data, etc.
• Once the data is collected, the data scientist processes the raw data and converts it
into a format suitable for analysis. This involves cleaning and validating the data to
guarantee uniformity, completeness, and accuracy.

• After the data has been rendered into a usable form, it’s fed into the analytic
system—ML algorithm or a statistical model. This is where the data scientists
analyze and identify patterns and trends.

• When the data has been completely rendered, the data scientist interprets the data to
find opportunities and solutions.

• The data scientists finish the task by preparing the results and insights to share with
the appropriate stakeholders and communicating the results.

Use of Data Science

1. Data science may detect patterns in seemingly unstructured or unconnected data,

allowing conclusions and predictions to be made.

2. Tech businesses that acquire user data can utilise strategies to transform that data
into valuable or profitable information.

3. Data Science has also made inroads into the transportation industry, such as with
driverless cars. It is simple to lower the number of accidents with the use of
driverless cars. For example, with driverless cars, training data is supplied to the
algorithm, and the data is examined using data Science approaches, such as the speed
limit on the highway, busy streets, etc.

4. Data Science applications provide a better level of therapeutic customisation

through genetics and genomics research.

Difference Between Business Intelligence and Data Science

Business intelligence is a combination of the strategies and technologies used for the analysis
of business data/information. Like data science, it can provide historical, current, and predictive
views of business operations. However, there are some key differences.
Business Intelligence Data Science

Uses both structured and

Uses structured data
unstructured data

Scientific in nature - perform an

Analytical in nature - provides a historical report of the
in-depth statistical analysis on
data
the data

Leverages more sophisticated

Use of basic statistics with emphasis on visualization
statistical and predictive analysis
(dashboards, reports)
and machine learning (ML)

Combines historical and current

Compares historical data to current data to identify
data to predict future
trends
performance and outcomes
Applications of Data Science

Data science has found its applications in almost every industry.

1. Healthcare

Healthcare companies are using data science to build sophisticated medical instruments to
detect and cure diseases.

2. Gaming

Video and computer games are now being created with the help of data science and that has
taken the gaming experience to the next level.

3. Image Recognition

Identifying patterns in images and detecting objects in an image is one of the most popular data
science applications.

4. Recommendation Systems

Netflix and Amazon give movie and product recommendations based on what you like to
watch, purchase, or browse on their platforms.
5. Logistics

Data Science is used by logistics companies to optimize routes to ensure faster delivery of
products and increase operational efficiency.

6. Fraud Detection

Banking and financial institutions use data science and related algorithms to detect fraudulent
transactions.

7. Internet Search

When we think of search, we immediately think of Google. Right? However, there are other
search engines, such as Yahoo, Duckduckgo, Bing, AOL, Ask, and others, that employ data
science algorithms to offer the best results for our searched query in a matter of seconds. Given
that Google handles more than 20 petabytes of data per day. Google would not be the 'Google'
we know today if data science did not exist.

8. Speech recognition

Speech recognition is dominated by data science techniques. We may see the excellent work
of these algorithms in our daily lives. Have you ever needed the help of a virtual speech
assistant like Google Assistant, Alexa, or Siri? Well, its voice recognition technology is
operating behind the scenes, attempting to interpret and evaluate your words and delivering
useful results from your use. Image recognition may also be seen on social media platforms
such as Facebook, Instagram, and Twitter. When you submit a picture of yourself with someone
on your list, these applications will recognise them and tag them.

9. Targeted Advertising

If you thought Search was the most essential data science use, consider this: the whole digital
marketing spectrum. From display banners on various websites to digital billboards at airports,
data science algorithms are utilised to identify almost anything. This is why digital
advertisements have a far higher CTR (Call-Through Rate) than traditional marketing. They
can be customised based on a user's prior behaviour. That is why you may see adverts for Data
Science Training Programs while another person sees an advertisement for clothes in the same
region at the same time.
10. Airline Route Planning

As a result of data science, it is easier to predict flight delays for the airline industry, which is
helping it grow. It also helps to determine whether to land immediately at the destination or to
make a stop in between, such as a flight from Delhi to the United States of America or to stop
in between and then arrive at the destination.

11. Augmented Reality

Last but not least, the final data science applications appear to be the most fascinating in the
future. Yes, we are discussing something other than augmented reality. Do you realise there's
a fascinating relationship between data science and virtual reality? A virtual reality headset
incorporates computer expertise, algorithms, and data to create the greatest viewing experience
possible. The popular game Pokemon GO is a minor step in that direction. The ability to wander
about and look at Pokemon on walls, streets, and other non-existent surfaces. The makers of
this game chose the locations of the Pokemon and gyms using data from Ingress, the previous
app from the same business.

Example of Data Science

Here are some brief overviews of a couple of use cases, showing data science’s versatility.

• Law Enforcement: In this scenario, data science is used to help police in Belgium to
better understand where and when to deploy personnel to prevent crime. With only
limited resources and a large area to cover data science used dashboards and reports
to increase the officers’ situational awareness, allowing a police force that’s spread
thin to maintain order and anticipate criminal activity.

• Pandemic Fighting: The state of Rhode Island wanted to reopen schools, but was
naturally cautious, considering the ongoing COVID-19 pandemic. The state used
data science to expedite case investigations and contact tracing, enabling a small
staff to handle an overwhelming number of concerned calls from citizens. This
information helped the state set up a call center and coordinate preventative
measures.
• Driverless Vehicles: Lunewave, a sensor manufacturing company, was looking for
a way to make sensor technology more cost-effective and accurate. They turned to
data science and machine learning to train their sensors to be safer and more reliable,
as well as using data to improve their 3D-printed sensor manufacturing process.

• Entertainment: Data science enables streaming services to follow and evaluate what
consumers view, which aids in the creation of new TV series and films. Data-driven
algorithms are also utilised to provide tailored suggestions based on the watching
history of a user.

• Finance: Banks and credit card firms mine and analyse data in order to detect
fraudulent activities, manage financial risks on loans and credit lines, and assess
client portfolios in order to uncover upselling possibilities.

• Manufacturing: Data science applications in manufacturing include supply chain

management and distribution optimization, as well as predictive maintenance to
anticipate probable equipment faults in facilities before they occur.

• Healthcare: Machine learning models and other data science components are used
by hospitals and other healthcare providers to automate X-ray analysis and assist
doctors in diagnosing illnesses and planning treatments based on previous patient
outcomes.

• Retail: Retailers evaluate client behaviour and purchasing trends in order to provide
individualised product suggestions as well as targeted advertising, marketing, and
promotions. Data science also assists them in managing product inventories and
supply chains in order to keep items in stock.

What is data and computational science?

Data Science is the art of generating insight, knowledge and predictions by processing of data
gathered about a system or a process. Computational Science is the art of developing validated
(simulation) models in order to gain a better understanding of a phenomenon (systems or
processes).

WHAT IS COMPUTATIONAL DATA SCIENCE?

Computational Data Science combines aspects of statistics, computer science, mathematics and
machine learning to identify trends, make predictions, and solve problems. Computational data
science uses algorithms and data structures to store, manipulate, visualize and learn from large
data sets.
WHAT DOES A COMPUTATIONAL DATA SCIENTIST DO?
Computational Data Scientists work in a wide variety of industries. Some common tasks
include:
• Collecting and categorizing large datasets
• Cleaning and validating data to ensure accuracy, completeness and uniformity
• Identifying patterns and trends in data sets
• Devising models and algorithms to uncover hidden meaning
• Forecasting future trends and results
• Training intelligent systems
• Producing summarizations and visualizations of datasets and communicate results to
stakeholders
• Discovering solutions and opportunities through an understanding of data sets

Difference between Structured data and Unstructured data

Structured Data

➢ The data which is to the point, factual, and highly organized is referred to as structured
data. It is quantitative in nature, i.e., it is related to quantities that means it contains
measurable numerical values like numbers, dates, and times.
➢ It is easy to search and analyze structured data. Structured data exists in a predefined
format. Relational database consisting of tables with rows and columns is one of the
best examples of structured data.
➢ Structured data generally exist in tables like excel files and Google Docs spreadsheets.
The programming language SQL (structured query language) is used for managing the
structured data. SQL is developed by IBM in the 1970s and majorly used to handle
relational databases and warehouses.
➢ Structured data is highly organized and understandable for machine language. Common
applications of relational databases with structured data include sales transactions,
Airline reservation systems, inventory control, and others.

Unstructured Data

➢ All the unstructured files, log files, audio files, and image files are included in the
unstructured data. Some organizations have much data available, but they did not know
how to derive data value since the data is raw.
➢ Unstructured data is the data that lacks any predefined model or format. It requires a lot
of storage space, and it is hard to maintain security in it. It cannot be presented in a data
model or schema.
➢ That's why managing, analyzing, or searching for unstructured data is hard. It resides
in various different formats like text, images, audio and video files, etc. It is qualitative
in nature and sometimes stored in a non-relational database or NO-SQL.
➢ It is not stored in relational databases, so it is hard for computers and humans to interpret
it. The limitations of unstructured data include the requirement of data science experts
and specialized tools to manipulate the data.
➢ The amount of unstructured data is much more than the structured or semi-structured
data.
➢ Examples of human-generated unstructured data are Text files, Email, social media,
media, mobile data, business applications, and others.
➢ The machine-generated unstructured data includes satellite images, scientific data,
sensor data, digital surveillance, and many more.

Structured data v/s Unstructured data

On the basis Structured data Unstructured data

Technology It is based on a relational database. It is based on character and binary data.

Flexibility Structured data is less flexible and There is an absence of schema, so it is more
schema-dependent. flexible.

Scalability It is hard to scale database schema. It is more scalable.

Robustness It is very robust. It is less robust.

Performance Here, we can perform a structured While in unstructured data, textual queries
query that allows complex joining, are possible, the performance is lower than
so the performance is higher. semi-structured and structured data.
Nature Structured data is quantitative, i.e., It is qualitative, as it cannot be processed and
it consists of hard numbers or things analyzed using conventional tools.
that can be counted.

Format It has a predefined format. It has a variety of formats, i.e., it comes in a

variety of shapes and sizes.

Analysis It is easy to search. Searching for unstructured data is more

difficult.

Categorical Data vs. Quantitative Data: What’s the Difference?

What is categorical data

Categorical data refers to values that are divided into groups, or categories, such as gender,

country of origin, or eye color. It is often expressed in non-numeric forms such as words, letters,

or symbols instead of numerical values. Categorical data can provide insight into the

characteristics of different groups, or populations. For example, a survey may reveal that males

are more likely to purchase a certain product than females.

When to use categorical data

As data science has grown in prominence, categorical data has become increasingly important.

Categorical data is often used when trying to ascertain correlation between different variables,

such as whether certain behaviors or characteristics are associated with particular outcomes. It

can also be used to help understand trends and patterns that may exist within a population. For

example, a study may use it to determine whether certain demographic factors – such as age,
income level, or even IT knowledge level, for example (e.g. knowing how to fix corrupted
Windows files) – predict certain behaviors. And, on top of that, categorical data can be used to

segment customers into different groups for targeted marketing campaigns.

Benefits of using categorical data

There are plenty of advantages to using categorical data. It is much easier to interpret and

analyze than quantitative data, which makes it an ideal choice for people without a strong

background in mathematics or statistics. Since it’s non-numeric, it allows researchers and

analysts to gain insight into the data without having to run complex and expensive quantitative

analyses.

What is quantitative data

Quantitative data, on the other hand, refers to numerical measurements - it does not involve

grouping values into categories. Quantitative data can be used to measure changes or trends

over time. For example, consumer behavior trends such as the number of people aged between

18-25 who use a smartphone app can be tracked to measure uptake and usage over different

periods of time.

When to use quantitative data

Analyzing quantitative data can allow us to see how variables are related. It can be used to

measure changes in behavior, attitudes, or preferences over time. Quantitative data is often

used in research studies and surveys that involve collecting numerical data from participants.

It is also found useful for setting objectives and targets as well as tracking performance.
Benefits of using quantitative data

The main advantage of using quantitative data is that it allows researchers and analysts to make

predictions based on patterns and trends they observe in the data. This can help companies

make better decisions when it comes to product development, marketing strategies, and

customer service. And quantitative data provides a level of accuracy that is often not achievable

with categorical data due to its numerical nature.

Differences between categorical and quantitative data

When it comes to differences between categorical and quantitative data, the most significant is

in the way each type of data is analyzed. Categorical data can be analyzed by counting the

frequency of each group, while quantitative data requires mathematical operations such as

summation or averaging to determine meaningful correlations. Furthermore, categorical data

has limited application in statistical analysis and modeling since it provides less information

than quantitative data.

Categorical data is used to describe characteristics of a population based on non-numeric values

while quantitative data is used to measure numerical values over time or to compare different

groups. Categorical data can provide insights into how different populations interact with each

other, which can be used for targeted marketing, while quantitative data can be used for

predictive analysis and setting objectives.

Difference Between Small Data and Big Data

Small Data: It can be defined as small datasets that are capable of impacting decisions in
the present. Anything that is currently ongoing and whose data can be accumulated in an
Excel file. Small Data is also helpful in making decisions, but does not aim to impact the
business to a great extent, rather for a short span of small data can be described as small
datasets that are capable of having an influence on current decisions. Almost everything
currently in progress and the data of which can be acquired in an Excel file. Small data is
also useful in decision-making but is not intended to have a large impact on business, rather
for a short period of time.
In nutshell, data that is simple enough to be used for human understanding in such a
volume and structure that makes it accessible, concise, and workable is known as small
data.

Big Data: It can be represented as large chunks of structured and unstructured data. The
amount of data stored is immense. It is therefore important for analysts to thoroughly dig the
whole thing into making it relevant and useful to make proper business decisions.
In short, datasets that are really huge and complex that conventional data processing
techniques cannot manage them are known as big data.

Feature Small Data Big Data

Technology Traditional Modern

The Big Data collection is done by

Generally, it is obtained in using pipelines having queues like
an organized manner than is AWS Kinesis or Google Pub / Sub to
Collection inserted into the database balance high-speed data

Data in the range of tens or

Volume hundreds of Gigabytes Size of Data is more than Terabytes

Analysis Clusters (Data Scientists), Data

Areas Data marts (Analysts) marts(Analysts)

Contains less noise as data

is less collected in a Usually, the quality of data is not
Quality controlled manner guaranteed

It requires batch-oriented It has both batch and stream

Processing processing pipelines processing pipelines

Database SQL NoSQL

A regulated and constant Data arrives at extremely high speeds,

flow of data, data large volumes of data aggregation in
Velocity aggregation is slow a short time
Feature Small Data Big Data

Numerous variety of data set

Structured data in tabular including tabular data, text, audio,
format with fixed images, video, logs, JSON etc.(Non
Structure schema(Relational) Relational)

They are mostly based on

horizontally scaling architectures,
They are usually vertically which gives more versatility at a
Scalability scaled lower cost

Query
Language only Sequel Python, R, Java, Sequel

Hardware A single server is sufficient Requires more than one server

Complex data mining techniques for

Business Intelligence, pattern finding, recommendation,
Value analysis and reporting prediction etc.

Data can be optimized Requires machine learning techniques

Optimization manually(human powered) for data optimization

Usually requires distributed storage

Storage within enterprises, systems on cloud or in external file
Storage local servers etc. systems

Data Analysts, Database Data Scientists, Data Analysts,

Administrators and Data Database Administrators and Data
People Engineers Engineers

Securing Big Data systems are much

Security practices for Small more complicated. Best security
Data include user privileges, practices include data encryption,
data encryption, hashing, cluster network isolation, strong
Security etc. access control protocols etc.

Database, Data Warehouse,

Nomenclature Data Mart Data Lake
Feature Small Data Big Data

Predictable resource
allocation, mostly vertically More agile infrastructure with
Infrastructure scalable hardware. horizontally scalable hardware

Data Science Notes Mtech
No ratings yet
Data Science Notes Mtech
115 pages
Ocs353 Data Science Fundamentals Notes
No ratings yet
Ocs353 Data Science Fundamentals Notes
145 pages
Unit 1 DS BCA NOTES
No ratings yet
Unit 1 DS BCA NOTES
7 pages
Introduction To Data-Science
No ratings yet
Introduction To Data-Science
246 pages
DS Notes
No ratings yet
DS Notes
159 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Introductiontodatascience 230122140841 B90a0856 1
No ratings yet
Introductiontodatascience 230122140841 B90a0856 1
44 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Data Science Bcs A
No ratings yet
Data Science Bcs A
20 pages
Data Science Life Cycle
No ratings yet
Data Science Life Cycle
12 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Data Science Study Materials
No ratings yet
Data Science Study Materials
47 pages
HUI-CMP201 Note 5
No ratings yet
HUI-CMP201 Note 5
62 pages
Unit - 1 DS
No ratings yet
Unit - 1 DS
24 pages
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
100% (1)
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
122 pages
Illuminati: Mathematics Class Tutorial Sheet-2 Conic Section
100% (4)
Illuminati: Mathematics Class Tutorial Sheet-2 Conic Section
10 pages
Introductiontodatascience 230122140841 B90a0856
No ratings yet
Introductiontodatascience 230122140841 B90a0856
44 pages
Unit 1 DA
No ratings yet
Unit 1 DA
72 pages
CD101 Fundamental of Data Science
No ratings yet
CD101 Fundamental of Data Science
41 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
DS B&V-1
No ratings yet
DS B&V-1
30 pages
Data Science Presentation
No ratings yet
Data Science Presentation
27 pages
Data Science Introduction
No ratings yet
Data Science Introduction
24 pages
Ch7-Overview of Data Science-Part 1
No ratings yet
Ch7-Overview of Data Science-Part 1
37 pages
Task 2a
No ratings yet
Task 2a
16 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Science Tutorial 1
No ratings yet
Data Science Tutorial 1
26 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
Introduction To Data Science What Is Data Science?
No ratings yet
Introduction To Data Science What Is Data Science?
11 pages
Applied - Data - Science MODULE 1 SEM8
No ratings yet
Applied - Data - Science MODULE 1 SEM8
16 pages
Vishwha D
No ratings yet
Vishwha D
29 pages
Unit-1 - IDS
No ratings yet
Unit-1 - IDS
29 pages
AI UNIT 1 Data Science
No ratings yet
AI UNIT 1 Data Science
16 pages
1 - Introduction To Data Science
No ratings yet
1 - Introduction To Data Science
28 pages
Applied Data Science Career Guide
No ratings yet
Applied Data Science Career Guide
15 pages
Mining Structures of Factual Knowledge From Text - 9781681733937 - WEB PDF
No ratings yet
Mining Structures of Factual Knowledge From Text - 9781681733937 - WEB PDF
199 pages
C1 Part2
No ratings yet
C1 Part2
28 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Anu Data Scie
No ratings yet
Anu Data Scie
32 pages
Data Science
No ratings yet
Data Science
7 pages
Differences Between Data Science and Data Analytics
No ratings yet
Differences Between Data Science and Data Analytics
10 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
Structured and Unstructured Data: Learning Outcomes
100% (1)
Structured and Unstructured Data: Learning Outcomes
13 pages
Assignment W1 UPDATED Merged
No ratings yet
Assignment W1 UPDATED Merged
102 pages
Data Science
No ratings yet
Data Science
18 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
16 pages
Data Science Components
No ratings yet
Data Science Components
7 pages
Data Science PDF
No ratings yet
Data Science PDF
8 pages
DC Circuit Workbook
No ratings yet
DC Circuit Workbook
52 pages
Date Planned: - / - / - Cbse Pattern Duration: 3 Hours Actual Date of Attempt: - / - / - Level - 0 Maximum Marks: 70
100% (1)
Date Planned: - / - / - Cbse Pattern Duration: 3 Hours Actual Date of Attempt: - / - / - Level - 0 Maximum Marks: 70
39 pages
AI Powered Task Manger
100% (1)
AI Powered Task Manger
6 pages
Big Data Challenges: 1. Dealing With Data Growth
100% (2)
Big Data Challenges: 1. Dealing With Data Growth
4 pages
Ds Intro KK
No ratings yet
Ds Intro KK
11 pages
DataScience Reading
No ratings yet
DataScience Reading
6 pages
Capacitors Workbook Solutions PDF
No ratings yet
Capacitors Workbook Solutions PDF
45 pages
DC Circuits Workbook Solutions PDF
No ratings yet
DC Circuits Workbook Solutions PDF
55 pages
Magnetic Effect of Current - Level - 2 - DTS 2 PDF
No ratings yet
Magnetic Effect of Current - Level - 2 - DTS 2 PDF
2 pages
Unit 1
No ratings yet
Unit 1
8 pages
Introduction To Data Science Lecture 1
No ratings yet
Introduction To Data Science Lecture 1
4 pages
BIG DATA With OIL and Gas
No ratings yet
BIG DATA With OIL and Gas
11 pages
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
No ratings yet
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
9 pages
Data Science
No ratings yet
Data Science
5 pages
Electrostatics Workbook PDF
No ratings yet
Electrostatics Workbook PDF
28 pages
DataScientist v2
No ratings yet
DataScientist v2
14 pages
How Does Data Science Works in 2021
No ratings yet
How Does Data Science Works in 2021
9 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Magnetic Effect of Current - Level - 1 - DTS 2
No ratings yet
Magnetic Effect of Current - Level - 1 - DTS 2
2 pages
Quiz 2
No ratings yet
Quiz 2
11 pages
Iq Bot v6.0 en PDF
No ratings yet
Iq Bot v6.0 en PDF
146 pages
Data Science: How Do Data Scientists Mine Out Insights?
No ratings yet
Data Science: How Do Data Scientists Mine Out Insights?
7 pages
Unit 2b AI Project Cycle
No ratings yet
Unit 2b AI Project Cycle
26 pages
Watson White Paper1
No ratings yet
Watson White Paper1
15 pages
BA Case Studies
100% (1)
BA Case Studies
2 pages
Class Test-2 Chemistry: Vidyamandir Classes
No ratings yet
Class Test-2 Chemistry: Vidyamandir Classes
1 page
BD Unit1
No ratings yet
BD Unit1
45 pages
BDU1
No ratings yet
BDU1
39 pages
Data Science Article
No ratings yet
Data Science Article
2 pages
Why Databricks - Ali - Ghodsi DAIS
No ratings yet
Why Databricks - Ali - Ghodsi DAIS
30 pages
Chapter 5 Big Data
No ratings yet
Chapter 5 Big Data
37 pages
Principles of Natural Language Processing
No ratings yet
Principles of Natural Language Processing
264 pages
Electrostatics Workbook Solutions
No ratings yet
Electrostatics Workbook Solutions
17 pages
Screenshot 2023-09-24 at 3.49.58 PM
No ratings yet
Screenshot 2023-09-24 at 3.49.58 PM
38 pages
Os 2020UIT3063
No ratings yet
Os 2020UIT3063
42 pages
Green Orange Blue Creative Healthcare Facility Presentation
No ratings yet
Green Orange Blue Creative Healthcare Facility Presentation
18 pages
Q.1-Write A Program To Enter Three No's and Display This Sum Using Eval
No ratings yet
Q.1-Write A Program To Enter Three No's and Display This Sum Using Eval
12 pages
Unit - 1 Notes MC
No ratings yet
Unit - 1 Notes MC
17 pages
Big Data Seminar S
No ratings yet
Big Data Seminar S
2 pages
Zoecon EndUseDilutionTable
No ratings yet
Zoecon EndUseDilutionTable
2 pages
Magnetic Effect of Current - Level - 1 - DTS 1
No ratings yet
Magnetic Effect of Current - Level - 1 - DTS 1
3 pages
Autonomy IDOL Server Technical Brief 1204 Rev1
No ratings yet
Autonomy IDOL Server Technical Brief 1204 Rev1
6 pages
Complete Download Web Semantics: Cutting Edge and Future Directions in Healthcare 1st Edition Sarika Jain PDF All Chapters
100% (1)
Complete Download Web Semantics: Cutting Edge and Future Directions in Healthcare 1st Edition Sarika Jain PDF All Chapters
55 pages
Magnetic Effect of Current - Level - 2 - DTS 2 - Solution PDF
No ratings yet
Magnetic Effect of Current - Level - 2 - DTS 2 - Solution PDF
2 pages
M. Tech Disseratation and B. Tech Project
No ratings yet
M. Tech Disseratation and B. Tech Project
1 page
Sandaruwan WP
No ratings yet
Sandaruwan WP
4 pages
DataVolo Modern Data Integration Platform
No ratings yet
DataVolo Modern Data Integration Platform
13 pages
Matlab
No ratings yet
Matlab
3 pages
Itfm Assignment Group 5
No ratings yet
Itfm Assignment Group 5
14 pages
Spread Spectrum
No ratings yet
Spread Spectrum
3 pages
A Holistic Framework For Knowledge Discovery and Management: Contributed Articles
No ratings yet
A Holistic Framework For Knowledge Discovery and Management: Contributed Articles
6 pages
Turban Dss9e Ch07
No ratings yet
Turban Dss9e Ch07
45 pages
Prob 7-50 PDF
No ratings yet
Prob 7-50 PDF
1 page
HA 08.07.2020 XII Solutions NCERT EXERCISE Q.S 2.17, 2.18, 2.19 Solved Example 2.6 Intext Question 2.9
No ratings yet
HA 08.07.2020 XII Solutions NCERT EXERCISE Q.S 2.17, 2.18, 2.19 Solved Example 2.6 Intext Question 2.9
1 page
Additional Table 5-2021
No ratings yet
Additional Table 5-2021
3 pages
Iia Text Analytics Unlocking Value Unstructured Data 108443 (2508)
No ratings yet
Iia Text Analytics Unlocking Value Unstructured Data 108443 (2508)
7 pages
Spe-203378-A Global Drilling KPIs Analysis System Based On Modern Data Science SINOPEC
No ratings yet
Spe-203378-A Global Drilling KPIs Analysis System Based On Modern Data Science SINOPEC
12 pages
01 Unit-I Introduction To Big Data
No ratings yet
01 Unit-I Introduction To Big Data
11 pages
Rohtang Pass Permit MIS: User Manual
No ratings yet
Rohtang Pass Permit MIS: User Manual
8 pages
Big Data
No ratings yet
Big Data
3 pages

Computational Data Science - Unit 1

Uploaded by

Computational Data Science - Unit 1

Uploaded by

COMPUTATIONAL DATA SCIENCE (ITITE68)

What Is Data Science?

The Data Science Lifecycle

3. Process: Data Mining, Clustering/Classification, Data Modeling, Data

5. Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision

Prerequisites for Data Science

Who Oversees the Data Science Process?

Data Science Managers

On a daily basis, a data scientist may do the following tasks:

1. Discover patterns and trends in datasets to get insights.

2. Create forecasting algorithms and data models.

3. Improve the quality of data or product offerings by utilising machine learning

4. Distribute suggestions to other teams and top management.

5. In data analysis, use data tools such as R, SAS, Python, or SQL.

6. Top the field of data science innovations.

What Does a Data Scientist Do?

Use of Data Science

1. Data science may detect patterns in seemingly unstructured or unconnected data,

4. Data Science applications provide a better level of therapeutic customisation

Difference Between Business Intelligence and Data Science

Uses both structured and

Scientific in nature - perform an

Leverages more sophisticated

Combines historical and current

Data science has found its applications in almost every industry.

11. Augmented Reality

Example of Data Science

• Manufacturing: Data science applications in manufacturing include supply chain

What is data and computational science?

WHAT IS COMPUTATIONAL DATA SCIENCE?

Difference between Structured data and Unstructured data

Structured data v/s Unstructured data

On the basis Structured data Unstructured data

Technology It is based on a relational database. It is based on character and binary data.

Scalability It is hard to scale database schema. It is more scalable.

Robustness It is very robust. It is less robust.

Format It has a predefined format. It has a variety of formats, i.e., it comes in a

Analysis It is easy to search. Searching for unstructured data is more

Categorical Data vs. Quantitative Data: What’s the Difference?

What is categorical data

are more likely to purchase a certain product than females.

When to use categorical data

segment customers into different groups for targeted marketing campaigns.

Benefits of using categorical data

background in mathematics or statistics. Since it’s non-numeric, it allows researchers and

What is quantitative data

When to use quantitative data

with categorical data due to its numerical nature.

Differences between categorical and quantitative data

summation or averaging to determine meaningful correlations. Furthermore, categorical data

than quantitative data.

Categorical data is used to describe characteristics of a population based on non-numeric values

predictive analysis and setting objectives.

Difference Between Small Data and Big Data

Feature Small Data Big Data

Technology Traditional Modern

The Big Data collection is done by

Data in the range of tens or

Analysis Clusters (Data Scientists), Data

Contains less noise as data

It requires batch-oriented It has both batch and stream

Database SQL NoSQL

A regulated and constant Data arrives at extremely high speeds,

Numerous variety of data set

They are mostly based on

Hardware A single server is sufficient Requires more than one server

Complex data mining techniques for

Data can be optimized Requires machine learning techniques

Usually requires distributed storage

Data Analysts, Database Data Scientists, Data Analysts,

Securing Big Data systems are much

Database, Data Warehouse,

You might also like