0% found this document useful (0 votes)

9 views14 pages

BA Data Science Foundations

Uploaded by

farazkh1311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views14 pages

BA Data Science Foundations

Uploaded by

farazkh1311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Define a multidisciplinary practice with multiple meanings

Here are the key takeaways from the video "Define a multidisciplinary practice with
multiple meanings":

 Definition of Data Scientist: The term "data scientist" is broad and not yet fully
standardized. It encompasses professionals from various fields like statistics, data
analysis, mathematics, systems engineering, and even business and finance.
 Multidisciplinary Nature: Data science is still evolving as a discipline. It involves
a mix of different fields and practices, similar to early archaeology before it
became formalized.
 Empirical Approach: A key aspect of data science is using an empirical
approach—asking questions, conducting experiments, and making adjustments
based on data to gain insights.

These points emphasize the evolving nature of data science and the importance of a
scientific method in the field.

Use statistics and software

Sure, let's break down the key concepts from the video "Use statistics and software":
Key Concepts:

1. Categories of Tools:
 Storing Data: Tools like spreadsheets, databases, and key-value stores
(e.g., Hadoop, Cassandra, PostgreSQL) are used to store large amounts of
data.
 Scrubbing Data: This involves cleaning and preparing data for analysis.
Tools include text editors, scripting tools, and programming languages like
Python.
 Analyzing Data: Statistical packages such as R, SPSS, and Python's data
libraries help analyze data and create visualizations.
2. Big Data:
 Definition: Big data refers to data sets so large that they can't fit into
traditional database management systems.
 Hadoop: An open-source software that uses a distributed file system to
store data across multiple servers (a Hadoop cluster). It processes data
using tools like MapReduce (batch processing) and Apache Spark (real-
time processing).
3. Data Scrubbing:
 Importance: Data scientists spend a significant amount of time (up to
90%) cleaning data to make it usable.
 Example: If collecting Tweets, you might create a script to separate text
from pictures to analyze them differently.
4. Statistical Analysis:
 R: A statistical programming language used to find connections and
correlations in data. It also has built-in data visualization features to create
reports with diagrams.
 Example: Analyzing Twitter data to see if there's a connection between
positive feedback and the time of day.

Simplified Explanation:

Think of data science tools like the brushes and pickaxes of archaeologists. They help you

dig through data to find valuable insights. The focus should be on the scientific method—
asking questions, conducting experiments, and analyzing results—rather than just the
tools themselves.

Uncover insights and create knowledge

Sure, let's break down the key points from the video "Uncover insights and create
knowledge":

Key Concepts:

1. Exploratory Nature of Data Science:

 Unlike traditional business processes that focus on efficiency and
achieving specific objectives, data science is exploratory. It uses the
scientific method to gain useful business knowledge.
 Example: Instead of asking "How can we work faster?" data science asks
"What do we know about our customers?" or "How can we deliver a better
product?"
2. Asking the Right Questions:
 Data science requires asking higher-level, often skeptical questions to gain
deeper insights. These questions might seem annoying in a typical
business setting but are crucial for building organizational knowledge.
 Example: Questions like "Why are we doing it this way?" or "What makes
you think this will work?" are essential for uncovering new opportunities
and improving processes.
3. Operational vs. Scientific Focus:
 Many organizations initially focus on the technical side of data, such as
collecting and storing it. However, the real value comes from the scientific
approach—asking interesting questions and running experiments.
 Example: A website collecting data on customer interactions might start
by gathering data, but the real insights come from experiments like
changing the color of car images to see which gets more clicks.
4. Empirical Research:
 Data scientists should constantly run experiments, ask questions, and
produce well-designed reports to gain insights.
 Example: Running an experiment to see if fewer cars on a webpage
increase the likelihood of customer clicks, then analyzing the results to
inform business decisions.

Simplified Explanation: Think of data science as a way to explore and discover new
knowledge about your business. It's like being a detective—asking questions, running

experiments, and analyzing data to uncover hidden insights. This approach helps
organizations make better decisions and stay competitive.

Make connections with relational databases

Let's break down the key points from the video "Make connections with relational
databases":
Key Concepts:

1. Origins of Modern Databases:

 Historical Context: Modern databases have roots in the Apollo Space
Mission of the late 1960s. NASA and IBM developed an information
management system (IMS) to handle the massive amounts of data
required for the mission.
 Early Databases: The early databases were like large spreadsheets with
columns and rows, but managing millions of rows was challenging.
2. Relational Databases:
 Development: In the mid-1970s, IBM developed Structured Query
Language (SQL) to help users pull data from these large systems. Around
the same time, relational databases were created.
 Structure: Relational databases divide data into smaller, related tables
instead of one massive table. For example, instead of one table with a
million parts, you might have 50 tables with 20,000 parts each.
3. Schemas and Design:
 Schemas: Engineers create schemas, or maps, to show how tables relate to
each other. Designing these schemas requires understanding the data and
anticipating future changes.
 Challenges: Designing relational databases requires a lot of upfront
planning. If the initial design is wrong, it can be difficult to redesign the
database later.
4. SQL and RDBMS:
 SQL: SQL is a powerful language that can pull data from multiple tables
and present it in a virtual table called a view. It's still one of the most
widely used query languages today.
 RDBMS: Relational Database Management Systems (RDBMS) like those
from IBM, Microsoft, and Oracle have added functionality over the years,
making them robust tools for managing relational databases.

Simplified Explanation: Think of a relational database like a library. Instead of having

one giant book with all the information, the library has many smaller books (tables)

organized by different topics. These books are related to each other through a catalog
(schema) that helps you find the information you need. SQL is like the librarian who helps

you pull information from different books and present it in a way that's easy to
understand.

Get data into warehouses using ETL

Sure, let's break down the key points from the video "Get data into warehouses using
ETL":
Key Concepts:

1. Relational Databases vs. Data Warehouses:

 Relational Databases (OLTP): These are optimized for real-time
transactions. For example, when a customer buys a shoe online, the
database quickly joins their shipping address with the shoe details to
process the order.
 Data Warehouses (OLAP): These are optimized for analyzing historical
data. For instance, you might analyze past sales to see if there's a trend in
shoe purchases based on customer location.
2. ETL Process:
 Extract: Pulling data from various sources, like different websites or
databases.
 Transform: Cleaning and converting the data into a format suitable for
the data warehouse. This might involve changing the data structure to
match the warehouse's schema.
 Load: Importing the transformed data into the data warehouse for
analysis.
3. Practical Example:
 Imagine your website sells running shoes and is bought by a larger
company that also sells sports clothing. The company will use ETL to
combine data from your website with their other websites. This helps them
analyze all their sales data together.
4. ETL in Data Science:
 Common Terminology: Terms like "ETL the data" mean transforming data
to fit into a new system, like a Hadoop cluster.
 Hadoop vs. Data Warehouses: Some companies are moving from
traditional data warehouses to Hadoop clusters to save costs, as Hadoop
can store data on cheaper hardware.

Simplified Explanation: Think of ETL as a process of moving and cleaning data to make
it useful for analysis. It's like taking ingredients from different stores (Extract), cleaning
and preparing them in your kitchen (Transform), and then cooking a meal (Load) that
you can analyze for nutritional value.

Let go of the past with NoSQL

Sure, let's break down the key points from the video "Let go of the past with NoSQL":
Key Concepts:

1. Relational Databases:
 Structure: Relational databases use a schema, meaning you need to know
the structure of your data (like tables and relationships) before storing it.
 Example: For a website selling shoes, you might have separate tables for
shoes, customers, addresses, and shipping. Each transaction involves
multiple tables, which can slow down performance.
2. NoSQL Databases:
 Flexibility: NoSQL databases are non-relational and schemaless, meaning
you don't need to predefine the structure. This makes them more flexible
and easier to change.
 Example: Instead of splitting data into multiple tables, you store
everything related to a transaction (shoe, customer, address, shipping) in a
single record.
3. Advantages of NoSQL:
 Performance: NoSQL databases can handle large amounts of data more
efficiently, especially for big websites and applications.
 Scalability: They are cluster-friendly, meaning you can distribute data
across many servers, making it easier to manage large datasets.
 Adaptability: Adding new fields or data types is simpler since there's no
rigid schema.
4. Real-World Application:
 Example: If your shoe website is bought by a larger company, integrating
new features like a frequent buyer program is easier with NoSQL. You can
add new fields without redesigning the entire database.

Simplified Explanation:
Think of a relational database like a well-organized library where you need to know
exactly where each book (data) goes. In contrast, a NoSQL database is like a flexible
storage room where you can quickly add new items without worrying about strict
organization.

Address big data problems

Let's break down the key points from the video "Address big data problems":

Key Concepts:

1. Big Data vs. Data Science:

 Big Data: Refers to data sets that are too large to be handled by
traditional hardware and software.
 Data Science: Uses the scientific method to analyze data, regardless of its
size.
2. The Four Vs of Big Data:
 Volume: Do you have a very high amount of data? (e.g., petabytes of data)
 Variety: Is your data diverse? (e.g., text, images, videos)
 Velocity: Is your data coming in quickly? (e.g., real-time data like stock
prices)
 Veracity: Is your data reliable and accurate?
3. Identifying Big Data Problems:

 Volume: If you're collecting petabytes of data daily, you likely have a big
data problem.
 Variety: Having different types of data (text, images, videos) indicates a
big data problem.
 Velocity: High-speed data inflow, like real-time updates, suggests a big
data problem.
 Veracity: Ensuring data accuracy and reliability is crucial for meaningful
insights.
4. Practical Example:
 Self-Driving Cars: They collect massive amounts of data (video, audio,
GPS) in real-time to make decisions, which is a classic big data problem.

Simplified Explanation:
Big data is like having an overwhelming amount of information coming in from various
sources at high speeds. To determine if you have a big data problem, check if your data
meets the Four Vs: Volume, Variety, Velocity, and Veracity.

Keep things simple with structured data

Sure, let's break down the key points from the video "Keep things simple with structured
data":
Key Concepts:

1. Structured Data:
 Definition: Structured data follows a specific format and order, like a
spreadsheet where each column has a defined type (e.g., dates, numbers).
 Example: Imagine a spreadsheet with a column for "Purchase Date." Each
entry must follow a specific format (e.g., MM/DD/YYYY).
2. Data Models and Schemas:
 Data Model: Defines the structure of individual fields (e.g., a field for
dates, another for text).
 Schema: Describes the entire structure of the database, including tables
and relationships.
3. Importance of Structure:
 Consistency: Ensures data is entered in a consistent format, making it
easier to sort, filter, and analyze.
 Error Prevention: Prevents invalid data entries (e.g., entering "Tuesday" in
a date field).
4. Relational Databases:
 Optimization: Relational databases are optimized for structured data,
making them efficient for tasks like generating reports from consistent
data sets.

Simplified Explanation:
Think of structured data like a well-organized filing cabinet. Each drawer (column) is
labeled and contains specific types of documents (data). This organization makes it easy
to find and use the information later.

Share semistructured data

Let's break down the key points from the video "Share semistructured data":

Key Concepts:

1. Structured Data:
 Definition: Data that fits neatly into a predefined schema, like a
spreadsheet with fixed columns and rows.
 Example: A table with columns for "ZIPCode" and "PostalCode."
2. Semistructured Data:
 Definition: Data that has some structure but doesn't fit neatly into a rigid
schema. It includes tags or markers to separate data elements.
 Example: Email data where you have consistent fields like sender and
recipient, but the content varies.
3. Challenges with Semistructured Data:
 Schema Differences: Different systems might use different names for the
same data fields (e.g., "ZIPCode" vs. "PostalCode").
 Integration: Combining semistructured data from different sources can be
challenging because of these schema differences.
4. Common Formats:
 XML: An older format used for exchanging semistructured data.
 JSON: A more modern format often used for web services, making it
easier to exchange data between different systems.
5. Practical Example:
 Scenario: Your shoe website needs to integrate shipping data from a
carrier. Your database uses "ZIPCode" while the carrier uses "PostalCode."
 Solution: You need to map these fields correctly to exchange data
seamlessly.
6. Benefits of Semistructured Data:
 Flexibility: Easier to adapt and integrate with different systems.
 Richness: Allows for more detailed and varied data to be included, like
customer feedback from social media.

Simplified Explanation:
Think of semistructured data like a recipe book where each recipe has a consistent
structure (ingredients, steps) but the content varies. You can easily add new recipes
without needing a strict format.

Collect unstructured data

Sure, let's break down the key points from the video "Collect unstructured data"
Key Concepts:

1. Unstructured Data:
 Definition: Data that doesn't have a predefined format or structure.
Examples include emails, social media posts, videos, and images.
 Example: Think about the variety of content you see when you search for
"cats" online—videos, images, articles, etc. All of this is unstructured data.
2. Challenges:
 Schemaless: Unlike structured data, unstructured data doesn't follow a
consistent format. For instance, a Microsoft Word document and a PDF
have different structures.
 Data Model: There's no consistent place to look for specific information
(e.g., document title) across different file types.
3. Handling Unstructured Data:
 NoSQL Databases: These databases can store large files like audio, video,
and text without requiring a predefined schema.
 Big Data Tools: Technologies like Hadoop and Apache Spark help process
and analyze large volumes of unstructured data.
4. Practical Application:
 Customer Insights: For a business, unstructured data can provide a 360-
degree view of customers. For example, analyzing social media posts to
understand customer preferences and behaviors.

Simplified Explanation:
Think of unstructured data like a messy room where items are scattered everywhere.
Unlike a neatly organized room (structured data), you need special tools to find and
make sense of everything in the messy room.
Sift through big garbage
Let's highlight the key takeaways from the video "Sift through big garbage":
Key Takeaways:

 Data Retention Dilemma:

 Keep Everything: Some argue it's cheaper and easier to store all data, as
storage costs are low.
 Delete Some Data: Others argue that too much data (or "data noise")
makes it harder to find valuable insights.
 Team Decision:
 It's crucial for your data science team to decide early on a data retention
policy. Consistency in this policy helps avoid data corruption and ensures
meaningful analysis.
 Practical Example:
 A company dealing with car buyer data faced challenges with obsolete
tags and data noise. They had to decide whether to keep all data or clean
up the obsolete parts.

By understanding these points, you can better manage your data and make informed
decisions about what to keep and what to discard.

Start out with descriptive statistics

Sure, let's break down the key points from the video "Start out with descriptive
statistics":
Key Concepts:

1. Descriptive Statistics:
 Definition: Tools used to summarize or describe a set of data. They help
tell a story about the data without going into every detail.
2. Mean (Average):
 Definition: The sum of all values divided by the number of values.
 Example: If you add up the incomes of all families and divide by the
number of families, you get the mean income.
3. Median:
 Definition: The middle value in a list of numbers sorted from smallest to
largest.
 Example: If you list all family incomes from lowest to highest, the median
is the income of the family in the middle.
4. Storytelling with Statistics:
Example: One politician might say the average salary has increased by
$5,000, while another might say the median salary has decreased by
$10,000. Both can be true because they are using different statistics to tell
different stories.
5. Skewed Data:
 Definition: When there's a big difference between the mean and median,
it indicates that the data might be skewed by extreme values.
 Example: If a few families are extremely wealthy, their high incomes can
raise the mean but not affect the median much.

Simplified Explanation:
Think of descriptive statistics like different ways to summarize a story. The mean gives
you an overall average, while the median tells you what the middle looks like. Both are
useful, but they can tell different stories depending on the data.

Understand probability
Let's break down the key points from the video "Understand probability":
Key Concepts:

1. Probability Basics:
 Definition: Probability measures the likelihood that a specific event will
occur. It's expressed as a percentage or a fraction.
 Example: Flipping a coin has a 50% probability of landing on heads.
2. Probability Distribution:
 Definition: A mathematical function that provides the probabilities of
occurrence of different possible outcomes.
 Example: Rolling a six-sided die has six possible outcomes, each with a
probability of 1/6 (or about 17%).
3. Sequence of Events:
 Definition: The probability of multiple events occurring in sequence is the
product of their individual probabilities.
 Example: Rolling a specific number twice in a row on a die is 1/6 * 1/6 =
1/36 (or about 3%).
4. Practical Application:
 Example: A biotech company uses probability to predict participation in
clinical trials. Factors like fasting before the trial or fear of needles can
decrease participation likelihood.
5. Balancing Accuracy and Participation:
Scenario: The company must decide between a more accurate blood test
(with fewer participants) and a less accurate saliva test (with more
participants). They use probability to weigh the trade-offs.
6. Unexpected Insights:
 Key Point: Probability can lead to surprising conclusions, such as
preferring a less accurate test to maximize participation and data points.

Simplified Explanation:
Think of probability like predicting the weather. If there's a 70% chance of rain, you
know it's more likely to rain than not. Similarly, in data science, probability helps predict
outcomes based on data.

Find a correlation
Sure, let's break down the key points from the video "Find a correlation":
Key Concepts:

1. Correlation:
 Definition: Correlation measures the relationship between two variables. It
tells you how one variable change when the other one does.
 Scale: Correlation is measured on a scale from -1 to 1.
 1: Perfect positive correlation (as one variable increases, the other
also increases).
 0: No correlation (no relationship between the variables).
 -1: Perfect negative correlation (as one variable increases, the other
decreases).
2. Positive Correlation:
 Example: Height and weight. Generally, taller people tend to weigh more.
As height increases, weight also increases.
3. Negative Correlation:
 Example: Car weight and fuel efficiency. Heavier cars usually get fewer
miles per gallon. As car weight increases, fuel efficiency decreases.
4. Real-World Applications:
 Recommendation Systems: Companies like Netflix and Amazon use
correlation to recommend movies or products based on your past
behavior.
 LinkedIn: The "People You May Know" feature uses correlation to suggest
connections based on shared jobs, schools, or interests.
5. Correlation Coefficient:
 Definition: A numerical value that represents the strength and direction of
the correlation.
 Example: A correlation coefficient of 0.5 indicates a moderate positive
relationship, while -0.75 indicates a strong negative relationship.

Simplified Explanation:
Think of correlation like a friendship. If two friends (variables) always do things together
(positive correlation), they have a strong positive relationship. If they always do the
opposite (negative correlation), they have a strong negative relationship. If they don't
influence each other at all, there's no correlation.

See how correlation does not imply causation

Let's break down the key points from the video "See how correlation does not imply
causation":
Key Concepts:

1. Correlation vs. Causation:

 Correlation: Indicates a relationship between two variables. For example,
ice cream sales and temperature are correlated because both tend to
increase together.
 Causation: Indicates that one variable directly affects the other. For
example, turning on a light switch causes the light to turn on.
2. Correlation Doesn't Imply Causation:
 Just because two things are correlated doesn't mean one causes the other.
There could be a third factor influencing both.
 Example: A retirement community has a high correlation with hospital
visits. This doesn't mean the community causes hospital visits; the true
cause is the higher median age of residents.
3. Spurious Correlation:
 Definition: A false relationship where two variables appear to be related
but are actually influenced by a third factor.
 Example: Increased sales of running shoes in January might be correlated
with New Year's resolutions rather than people having more money.
4. Scientific Method:
 To avoid false conclusions, follow the scientific method: ask good
questions, form hypotheses, and test them rigorously.
 Example: The data science team initially thought January shoe sales were
due to people having more money. After further analysis, they found it was
due to New Year's resolutions.

Simplified Explanation:
Think of correlation like two events happening together, like more ice cream sales on
hot days. However, this doesn't mean hot days cause ice cream sales; there could be
other reasons like people wanting to cool down.

Comb techniques for predictive analytics

Sure, let's break down the key points from the video "Comb techniques for predictive
analytics":
Key Concepts:

1. Predictive Analytics:
 Definition: Uses historical data to predict future events. It's a subset of
data science.
 Example: Weather forecasting uses past weather data to predict future
conditions.
2. Difference from Data Science:
 Data Science: Applies the scientific method to data to uncover insights.
 Predictive Analytics: Takes these insights and makes actionable
predictions.
3. Practical Example:
 Weather Forecasting: Meteorologists use historical data and correlations
(like low pressure leading to storms) to predict future weather.
 Business Application: Imagine your team analyzes millions of Tweets
about running. By identifying influential runners, you can send them
promotions to boost your brand.
4. Importance of Data Quality:
 Key Point: The accuracy of predictions depends on the quality of the data
and the thoroughness of the analysis. Ensure your team understands the
past data well to make accurate future predictions.

Simplified Explanation:
Think of predictive analytics like using past experiences to make future decisions. For
example, if you know it usually rains when the sky is cloudy, you might predict rain and
carry an umbrella. In business, this means using past data to forecast trends and make
informed decisions.

Foundations of Data Science PPT TEXT BOOK
No ratings yet
Foundations of Data Science PPT TEXT BOOK
132 pages
Introduction To Data Science and Analytics
100% (2)
Introduction To Data Science and Analytics
31 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
Data Science Unit-I
No ratings yet
Data Science Unit-I
13 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
21css303t Datascience Unit 1 Notes (1)
No ratings yet
21css303t Datascience Unit 1 Notes (1)
246 pages
Unit I- Data Science
No ratings yet
Unit I- Data Science
161 pages
FDS - UNIT 1
No ratings yet
FDS - UNIT 1
233 pages
Introduction To Data Science What Is Data Science?
No ratings yet
Introduction To Data Science What Is Data Science?
11 pages
SQL for Data Analysis. a Middle-Level Guide...2024 (Johanson L.) (Z-Library)
No ratings yet
SQL for Data Analysis. a Middle-Level Guide...2024 (Johanson L.) (Z-Library)
235 pages
Unit I- Data Science
No ratings yet
Unit I- Data Science
161 pages
Coursera - IBM - Introduction To Data Analytics
No ratings yet
Coursera - IBM - Introduction To Data Analytics
13 pages
unit 1 final (1)
No ratings yet
unit 1 final (1)
75 pages
Session 1
No ratings yet
Session 1
48 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
HUI-CMP201 Note 5
No ratings yet
HUI-CMP201 Note 5
62 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
DS B&V-1 (1)
No ratings yet
DS B&V-1 (1)
30 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
1. Introduction to Data Science.docx
No ratings yet
1. Introduction to Data Science.docx
24 pages
ds
No ratings yet
ds
38 pages
DWH_session1
No ratings yet
DWH_session1
36 pages
Ch7-Overview of Data Science-part 1
No ratings yet
Ch7-Overview of Data Science-part 1
37 pages
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
No ratings yet
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
68 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Ds unit 2 notes
No ratings yet
Ds unit 2 notes
26 pages
Unit 1
No ratings yet
Unit 1
60 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
Ccw331 Two Marks
No ratings yet
Ccw331 Two Marks
18 pages
20IT501_BDA_Unit1
No ratings yet
20IT501_BDA_Unit1
18 pages
Introduction-to-Data-Science
No ratings yet
Introduction-to-Data-Science
19 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
31 pages
Unit -1 DS
No ratings yet
Unit -1 DS
24 pages
unit 1 notes
No ratings yet
unit 1 notes
17 pages
Data Science
No ratings yet
Data Science
31 pages
Vishwha D
No ratings yet
Vishwha D
29 pages
Chapter-1 DS
No ratings yet
Chapter-1 DS
15 pages
Data Science: by Neha Tyagi
100% (1)
Data Science: by Neha Tyagi
17 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
Session 1819
No ratings yet
Session 1819
47 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Data Science Unit I
No ratings yet
Data Science Unit I
13 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Data Science: Chapter 1: Introduction To Big Data
100% (2)
Data Science: Chapter 1: Introduction To Big Data
77 pages
Data Manipulation at Scale
No ratings yet
Data Manipulation at Scale
4 pages
1c. INTRODUCTION-Data-Science-basic
No ratings yet
1c. INTRODUCTION-Data-Science-basic
31 pages
Diff Analysisand Analytics
No ratings yet
Diff Analysisand Analytics
14 pages
Defining Data Science
100% (1)
Defining Data Science
167 pages
TLMweek1IntroDs
No ratings yet
TLMweek1IntroDs
11 pages
Data
No ratings yet
Data
43 pages
Explaratory Data Analysis - Python
No ratings yet
Explaratory Data Analysis - Python
16 pages
Intro To Data and Data Science
No ratings yet
Intro To Data and Data Science
9 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
18 pages
The 365 DS Booklet PDF
100% (1)
The 365 DS Booklet PDF
67 pages
Data Science: How Do Data Scientists Mine Out Insights?
No ratings yet
Data Science: How Do Data Scientists Mine Out Insights?
7 pages
AIDS Epidemiology A Quantitative Approach Instant PDF Download
100% (12)
AIDS Epidemiology A Quantitative Approach Instant PDF Download
17 pages
1 1 Intro To Data and Data Science Course Notes
No ratings yet
1 1 Intro To Data and Data Science Course Notes
8 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
CWS - Assignment Pratyush MAE28
No ratings yet
CWS - Assignment Pratyush MAE28
12 pages
Operations Management-Course Syllabus
75% (4)
Operations Management-Course Syllabus
12 pages
XmJfzRPVQN280XSjHTCE_Salesforce Admin Summary _ Cheat Sheet_watermark
No ratings yet
XmJfzRPVQN280XSjHTCE_Salesforce Admin Summary _ Cheat Sheet_watermark
103 pages
BA Business Modeling
No ratings yet
BA Business Modeling
29 pages
Inventory Management and Financial Performance of Micro Cellphone Retailers in City of San Jose Del Monte. 1
No ratings yet
Inventory Management and Financial Performance of Micro Cellphone Retailers in City of San Jose Del Monte. 1
44 pages
Modeling of Transportation Aviation Processes (Georgy Alekseevich Kryzhanovsky Etc.) (Z-Library)
No ratings yet
Modeling of Transportation Aviation Processes (Georgy Alekseevich Kryzhanovsky Etc.) (Z-Library)
191 pages
MSC Thesis in Computer Science PDF
100% (4)
MSC Thesis in Computer Science PDF
4 pages
HMAC230-1-T&L-WEEK9-UNIT7 - (CHAPTER10) - Solution
No ratings yet
HMAC230-1-T&L-WEEK9-UNIT7 - (CHAPTER10) - Solution
14 pages
Week 11
No ratings yet
Week 11
13 pages
BA Agile Foundations
No ratings yet
BA Agile Foundations
27 pages
Operation Management and Total Quality Management: PRELIM - 1st SEM, A.Y. 2019 - 2020
No ratings yet
Operation Management and Total Quality Management: PRELIM - 1st SEM, A.Y. 2019 - 2020
45 pages
Management
No ratings yet
Management
62 pages
Chapter-1 (Introduction To Statistics) - Md. Monowar Uddin Talukdar
No ratings yet
Chapter-1 (Introduction To Statistics) - Md. Monowar Uddin Talukdar
24 pages
Arr MS Id 555714
No ratings yet
Arr MS Id 555714
6 pages
MSCI570 - Lecture 8 - Advanced Regression Analysis 2022 Part 2
No ratings yet
MSCI570 - Lecture 8 - Advanced Regression Analysis 2022 Part 2
26 pages
Doruelo Salgado Sebio A02
No ratings yet
Doruelo Salgado Sebio A02
6 pages
London School of Commerce in Association With The University of Suffolk
No ratings yet
London School of Commerce in Association With The University of Suffolk
14 pages
Global Identity & Access Management (IAM) Market
0% (1)
Global Identity & Access Management (IAM) Market
54 pages
BA Foundations
No ratings yet
BA Foundations
9 pages
Exponential Smoothing: Level & Trend Data: CTL - SC1x - Supply Chain & Logistics Fundamentals
No ratings yet
Exponential Smoothing: Level & Trend Data: CTL - SC1x - Supply Chain & Logistics Fundamentals
21 pages
Met466 Module 2 Notes
No ratings yet
Met466 Module 2 Notes
14 pages
BA Group No - 10.
No ratings yet
BA Group No - 10.
39 pages
Objectives of HRP: The Following Are The Important Objectives of Human Resource Planning in An Enterprise
No ratings yet
Objectives of HRP: The Following Are The Important Objectives of Human Resource Planning in An Enterprise
42 pages
HR Chapter 3
No ratings yet
HR Chapter 3
9 pages
Chapter 3 - SCM
0% (1)
Chapter 3 - SCM
45 pages
Toll Roads - Managing Traffic and Revenue Risk
No ratings yet
Toll Roads - Managing Traffic and Revenue Risk
2 pages
A New Decline Curve Analysis Method For Layered Reservoirs
No ratings yet
A New Decline Curve Analysis Method For Layered Reservoirs
16 pages
On The Categorization of Demand Patterns
No ratings yet
On The Categorization of Demand Patterns
10 pages
Apics Cpim
100% (1)
Apics Cpim
16 pages
in Measuring Changes in The Value of Money
No ratings yet
in Measuring Changes in The Value of Money
6 pages
Electricity Load Demand Forecasting Using Exponential Smoothing Methods
No ratings yet
Electricity Load Demand Forecasting Using Exponential Smoothing Methods
4 pages
Process Views of Supply Chain
No ratings yet
Process Views of Supply Chain
3 pages
PDF Wbtni Basis of Strategy 09-01-09 03
No ratings yet
PDF Wbtni Basis of Strategy 09-01-09 03
7 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

BA Data Science Foundations

Uploaded by

BA Data Science Foundations

Uploaded by

Define a multidisciplinary practice with multiple meanings

Use statistics and software

Uncover insights and create knowledge

1. Exploratory Nature of Data Science:

Make connections with relational databases

1. Origins of Modern Databases:

Simplified Explanation: Think of a relational database like a library. Instead of having

Get data into warehouses using ETL

1. Relational Databases vs. Data Warehouses:

Let go of the past with NoSQL

Address big data problems

1. Big Data vs. Data Science:

Keep things simple with structured data

Share semistructured data

Collect unstructured data

 Data Retention Dilemma:

Start out with descriptive statistics

See how correlation does not imply causation

1. Correlation vs. Causation:

Comb techniques for predictive analytics

You might also like