0% found this document useful (0 votes)
2 views9 pages

Lecture7 Myths of Data Science

The document discusses the concepts of modeling in data science, distinguishing between statistical and algorithmic modeling, with a focus on their applications and limitations. It also addresses common myths in data science, emphasizing the crucial role of programmers in data collection, storage, processing, and modeling, while clarifying that machines primarily assist in executing tasks. Overall, it highlights the importance of human expertise in the data science process despite the reliance on machines for certain functions.

Uploaded by

sahil.y.prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views9 pages

Lecture7 Myths of Data Science

The document discusses the concepts of modeling in data science, distinguishing between statistical and algorithmic modeling, with a focus on their applications and limitations. It also addresses common myths in data science, emphasizing the crucial role of programmers in data collection, storage, processing, and modeling, while clarifying that machines primarily assist in executing tasks. Overall, it highlights the importance of human expertise in the data science process despite the reliance on machines for certain functions.

Uploaded by

sahil.y.prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Modelling in Data Science

&
Myths in Data Science

•Dr Vatan Sehrawat


•Asst. Professor, Computer Sc. & Engg. Department
•RBS-SIET Zainabad
[email protected]
•8059211113
Modelling Data:
• Modelling is to find a distribution function, to find the relation
between input and output. The function can be as simple as a linear
equation and as complex as Quadratic, Polynomial, sine, tan
functions.

• So Modelling can be mentioned in two separate classes:


• Statistical Modelling
• Algorithmic Modelling
Statistical Modelling:
• Modelling underlying data distribution
• Modelling underlying relations in data
• Formulate and test hypothesis
• Give statistical guarantees(p-values, goodness-of-fit tests)
• Statistical modelling are simple intuitive models suited for low
dimensional data but robust statistical analysis.
Algorithmic Modelling:
• Finding, the relation between input and output i.e. Y = f(x)
• f(x), can be any function. In real world data, the function can be very
complex. The ultimate goal is to estimate a function f, using data and
optimization techniques
• Complex Flexible models
• Can work with high dimensional data
• Not suitable robust statistical analysis
• Focus is on prediction.
• Data hungry models
Myths of Data Science:
Machine does Everything.(lets debunk this myth)
Collecting data:
• What to collect? ->Programmer job
• Where to collect? -> Programmer job
• How to collect data? -> Programmer job(by experimenting etc)
• Labelling data? -> Programmer Job
• Executing Scripts? ->Machine Job(Processing long complex jobs)
Storing Data:
• What schema? -> Programmer Designs
• Which file system? -> Programmer Decides(but machine provides the
system resources like storage)
• Processing Data:
• Domain knowledge required in Wrangling and munging data. -
> Programmer Job
• What data to clean? Programmer decides
• How to clean? Programmer has to know what to clean using statistics
• Study and Integrate: Programmer Job
• Multiple formats: Programmer decides what format to work with
• Machine helps in executing scripts for processing large amount of data.
Describing Data:
• Which columns? Programmer decides what column data is usable
• Which plots? Human readable format, Programmer decides
• Study trends? Programmer decides which trends using machine
• Execute scripts by machine to formulate large amounts of data
Modelling Data:
• Hypothesise, Propose, models, Oversee, Training, all done
by Programmer
• Estimate, parameters are learnt by machine by trying to optimise
using some learning algorithm.

You might also like