0% found this document useful (0 votes)
30 views8 pages

Lesson - Statprob-1

Data science involves collecting, managing, processing, analyzing, visualizing, and interpreting vast amounts of heterogeneous data from different sources to discover patterns and make decisions. It lies at the intersection of statistics, computational science, and domain-specific knowledge. While related, data science goes beyond traditional data analysis by not just explaining existing data but predicting future outcomes and possibilities. Machine learning is a tool that can help automate and scale data science tasks.

Uploaded by

jk7103052
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views8 pages

Lesson - Statprob-1

Data science involves collecting, managing, processing, analyzing, visualizing, and interpreting vast amounts of heterogeneous data from different sources to discover patterns and make decisions. It lies at the intersection of statistics, computational science, and domain-specific knowledge. While related, data science goes beyond traditional data analysis by not just explaining existing data but predicting future outcomes and possibilities. Machine learning is a tool that can help automate and scale data science tasks.

Uploaded by

jk7103052
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

LESSON - STATPROB

Data science (Danoho, 2017)


Bale dito po is sinasabi na yung data science daw is a process kaya nasabing it
is a scientific discovery and practice that involves the collection, management,
processing, analysis, visualization and interpretation of vast amounts of
heterogeneous data. it is a process of collecting data, managing the data,
processing the data, analyzing the data, visualization of data and interpreting
the data. heterogenous data means that it comes from different sources po.
yung scientific application is more on about data science that is being used in
physics or any other science related things, translational is more on about data
or information that is being translated into actionable plans and the
interdisciplinary is more on about combination of information from different
fields or souces para maka gather pa lalo ng information.

Data science (datascience.harvard.edu,


2017)
Bale dito po is for example ito dito is statistic and dito is computational and dito
sa baba is domain specific. kaya po nasabing lies at the intersection kasi
Statistics provides the foundational principles for analyzing and interpreting
data, while computational science equips practitioners with the tools and
techniques to process large volumes of data efficiently while yung domain
specific naman is about those information that comes from different field. so
sinasabi rito na yung data science are the combination of these three things.
kaya rin nasabing it incorporates the availabiliuty and diversity of quanti info
kasi it gather information in different ways and souces.

Data science (Leskove, Rajaraman, &


Ullman, 2014)

LESSON - STATPROB 1
For this is sinasabi rito na same concept lang daw yung data mining and big
data. So bakit sinasabing same concept lang sila? So basically, big data kasi
means it’s mass volume of data or information so bulk talaga siya either
structured or unstructured data. While data mining naman is it talks about
recognizing the pattern from a large dataset. For example is in a healthcare,
there are a lot of medical records about their patients when it comes to
diseases so what data mining do is that it helps the data scientist to recognize
the pattern from those records so that they would be able to know what are the
similarities between those diseases so that they will be able to create a
medicine for it. So need niyo lang tandaan dito is sinasabing same concept
daw ang dalawang ‘to.

Data science (Cao, 2017)


So dito naman is sinasabi na ginagamit daw ng data science yung statistics,
informatics,
computing, communication, management, and sociology para makapagkalap
pa ng information. Ang pinaka pinopoint out dito is yung paggamit ng DKW
method pagdating sa data science in order for the researcher to transform
those data into a decision. Alright, imagine you have a big box of colorful
building blocks (data). You start by playing with these blocks, sorting them by
color and size (analyzing the data). Then, you start to notice patterns, like how
the blue blocks always come after the red ones (knowledge). Finally, you use
these patterns to build something really cool, like a tall tower or a fancy castle
(wisdom). In the same way, data scientists play with lots of information, find
patterns in it, and use those patterns to make smart decisions and solve
important problems! Another example is yung you conducted a research about
the possible trends in fashion industry para malaman mo if anong product ba
dapatt gawin mo and yung nakalap mong data is you used it to make a decision
for your next product.

Data science (Sharma, 2019)


Dito naman is sinasabing ang data science daw is use to make decision and
predictions. Predictive causal analytics, first let’s differentiate them. Predictive
analytics talk about predicting or forecasting what might happen in the future
while causal, it talks about the cause and effect of a situation so when

LESSON - STATPROB 2
combined it is the prediction and the cause and effect of a situation then the
prescriptive analytics talk about not just predicting but also suggesting for an
alternative course of action. Machine learning is a branch of AI. Example is the
sales dept. predicted that they will have low sales this month, thats the
predictive. Then they learned the cause of it since the effect is already about
low sales then they discovered it was because of their sudden increase in
prices so that’s the causal. Then the marketing department suggested that they
should change their marketing strategy to avoid this sudden low sales and
that’s the prescriptive. Then since they want to be able to finish this task
immediately, they will ask for a help on a machine learning or an AI for their
work to be more convenient. Remember that when it comes to this machine,
they just follow a certain pattern so that they will be able to execute a certain
task.

For Karl Broman, professor of University


of Wisconsin
So sinasabi po rito na yung data science and statistic are the same because
statistics means that it’s a result of an analyzed data while yung data science is
a study of data so basically in data science kapag inaanalyze mo ung data is
you’re doing statistics kasi ang endpoint is statistic parin. So for example is sa
candy, in Data science is like trying to understand all the different candies and
what they taste like. Ngayon sa statistic is bibilangin mo if how many candies
of each color do u have and figuring out if ano ang pinaka common and rare
ones. so when someone says "data science is statistics," it's like saying that
understanding all those candies and what they taste like is really just about
counting and figuring things out about them, like you do with statistics.

For Applied Statistician Nate Silver


Dito po is kasi is kaya nasabing pwedeng tawaging data scientist ang mga
statistician kasi they mean that statisticians are interested in similar things as
data scientists. Statisticians work with data to find patterns and understand the
world better, just like data scientists do. So, when people say "data scientist" is
a little bit like saying the same thing as "statistician," they mean that both jobs
are about working with data and trying to understand it. They're just called
different things, but they're similar in many ways.

LESSON - STATPROB 3
For Andrew Gelman
Dito naman is sinasabi niyang hindi raw statistic ang pinaka importante sa data
science. Kasi even earlier, I keep mentioning that data science and statistic are
just the same based from the statement of other professor but then here is
kaya niya nasabing hindi ito pinaka importanteng part kasi when we heard
word data, what comes to our mind are either statistic or information. but then
it’s not just about statistic, to be able to be a great data scientist you must have
other skills like when it comes to coding or how to work with databases since
those are included in data science. It’s not just about analyzing statistics, since
here statistic for him is just another tool to be able to do data science but then
it’s not the most important tool or skill.

For Vasant Dhar


So ang sinasabi kasi rito na iba yung data science sa data analysis in what way
nga ba? kasi parang the same lang naman sila. So let me give an example
which is about a puzzle

Normally, when people analyze data (which is like looking at pieces of the
puzzle), they try to explain what they see in the puzzle. They might say, "This
piece is blue, so it goes with the sky." This helps them understand what's
already there in the puzzle.
Now, imagine you have a different kind of puzzle. Instead of just explaining
what's already there, you're trying to find pieces that fit together in a special
way. You're not just saying, "This piece is blue, so it goes with the sky." You're
saying, "If I put these pieces together, I can see a picture of a tree."

That's what this person means when they say data science is different from
traditional data analysis. Data science doesn't just explain what's already in the
data, it tries to find patterns that help predict what might happen in the future.
It's like trying to predict what the finished puzzle will look like before you even
put all the pieces together. It looks for patterns that can help people make
decisions or take actions based on what might happen next.

DATA SCIENCE

LESSON - STATPROB 4
Statisticians and mathematicians were some of the first people to work on
figuring out how to analyze and make sense of large amounts of data.

However, as data science has grown, it's also borrowed ideas and methods
from other fields, like computer science, information theory, and even areas
like psychology and sociology.
So, when we say data science is interdisciplinary, we mean that it brings
together ideas and techniques from lots of different fields to help us
understand and work with data in new and powerful ways.

DATA SCIENTIST SKILL: 1


So kasi yung mga data scientist diba need nilang magkalap ng information
from different sources which we call interdisciplinary so those data’s can be
messy and unorganize kaya dapat is alam nila if paano ito iarrange ng maayos
para magamit yung mga data.

DATA SCIENTIST SKILL: 2


Diba nasabi ko kanina that data scientist are the ones that are seek by business
owners kasi nga they can help them to properly manage their business
especially when it comes to identifying the trends in the industry through data
pattern para malaman ng mga business owners if ano dapat yung maging next
product nila.

DATA SCIENTIST SKILL: 3


Data scientists need to be proficient in programming languages such as SAS,
R, Python, and others. These languages are used for data manipulation,
analysis, and visualization.

DATA SCIENTIST SKILL: 4


Kasi nga diba sabi kanina from karl broman that data science is statistic kaya
need na may knowledge ka about statistics para ma analyze and interpret mo
yung data accurately

DATA SCIENTIST SKILL: 5

LESSON - STATPROB 5
Data scientists use advanced analytical techniques like machine learning, deep
learning, and text analytics to derive insights from data and build predictive
models. So yung mga ‘to kasi is more on AI related siya, ginagamit siya ng mga
data scientist para mas ma analyze nila yung data and ma predict nila yung
future outcomes ng data or malaman nila yung pattern kaya need nila ng
analytical techniques

DATA SCIENTIST SKILL: 6


Data scientists are tasked with identifying patterns, trends, and anomalies in
data that can provide valuable insights for businesses. These insights can
inform strategic decisions and improve business outcomes. Nasabi ko na ‘to
kanina on how data scientist were able to help business which is about ng sa
trends and yung pag prepredict nila dito

DATA SCIENTIST SKILL: 7


Since in all fields naman ata is need talaga ang communication skill especially
here sa data science because when u r conducting a research is need mo
makipag collaborate with other fields for more knowledge and information
about your research specially with IT since they have more knowledge when it
comes to technology and other technical stuff.

R language
So ito po is designed specifically designed for statistical analysis and data
visualization. Platform independent din siya kaya pwede siya magamit kahit
saang operating system and free software rin ito.

Phyton Language
So we all know what phyton is because of MIL, phyton is versatile so mas
preferred ‘to ng mga programmer compared to other language since mas
simple siya and mas madali maintindihan.

SAS LANGUAGE

LESSON - STATPROB 6
ito naman ay prominent statistical analysis tool widely used in commercial
analytics

Statistic
So basically as I’ve said earlier, statistic is related to data science. It’s also
about learning an information about something. Statistic help us especially
business owners in making decisions for their business through the help of
statistician.

STATISTICIAN
yung sinasabing applied statistic dito is about application of gathered data to
real life situation. most statistician may problem solving skills since hindi lang
dapat related to technology and skills na meron ang statistician since they r the
one who help businesses to properly read the gathered data and make
decisions

SAMPLE VS POPULATION
let's pretend you have a huge bucket filled with your favorite toys. That big
bucket with all your toys inside is like the population – it's every single toy you
have.

Now, let's say you want to know which toys are the most popular among your
friends, but you can't ask everyone because there are so many friends! So, you
decide to ask just a few of your friends which toys they like the best. The toys
that your friends picked and told you about, that's your sample.
So, the population is all the toys in your big bucket, and the sample is just the
few toys your friends told you they liked. You use the sample to understand
what toys are popular among all your friends, even though you can't ask every
single friend about every single toy.

Descriptive Stat
Inferential stat

LESSON - STATPROB 7
Alright, imagine you have a big jar of cookies, but you can't eat all of them. So,
you take a few cookies out to taste, and based on how those cookies taste, you
guess what all the cookies in the jar might taste like. That's kind of like
inferential statistics!

Inferential statistics help us understand things about a whole group, or


population, by looking at just a smaller part of it, called a sample. It's like trying
to guess what all the cookies taste like by trying just a few of them.

We use inferential statistics to figure out if what we find in our small group is
likely to be true for the whole group. For example, if we find that the cookies
we tried are sweet, we might guess that most of the cookies in the jar are
sweet too.

Scientists and researchers use fancy math methods, like the t-test or analysis
of variance, to help them make these guesses. It's like using special tools to
help figure out what all the cookies in the jar might be like, even though we
can't taste all of them.

LESSON - STATPROB 8

You might also like