What Is Data Science
What Is Data Science
Mathematics/Statistics
Databases/Programming
Business understanding
Component: Mathematics/statistics
At its core, data literacy involves a foundation in mathematics and statistics. This proficiency can be
dissected into three key levels:
Awareness of Techniques:
Recognizing the existence of various methods is crucial, as not knowing what's possible hinders effective
utilization. For example, in grouping similar customers, a data scientist must first understand that
statistical methods, like clustering, can be employed.
Application Proficiency:
Beyond awareness, proficiency lies in understanding the intricacies of applying techniques. This extends
beyond coding skills to encompass configuration complexities. For instance, using k-means clustering
involves not only knowing how to implement it in code but also understanding how to configure
parameters for optimal results.
Cluster analysis in a programming language like R or Python requires proficiency. Additionally, one must
grasp the skill of fine-tuning method parameters, such as determining the number of groups to establish.
Mathematics/statistics continuation
How to choose which techniques to try
Selecting the appropriate techniques is crucial in data science due
to the multitude of available options. It is imperative for data
scientists to swiftly evaluate the efficacy of a technique. In the
context of our customer grouping scenario, even when
concentrating on clustering, the data scientist must navigate
through numerous methods and algorithms. Instead of attempting
each one, the ability to swiftly eliminate unsuitable methods and
concentrate on a select few is essential.
A Mathematical Perspective on E-commerce Analysis
Excel, Tableau, and other business intelligence tools with graphical interfaces can handle substantial
data work without coding.
Despite not requiring code, these tools assert comparable functionality to R or Python, prompting
occasional use by data scientists.
However, these tools are not considered a comprehensive data science toolkit.
In reality, few companies operate data science teams that entirely avoid programming.
Programming offers distinct advantages over reliance on graphical interface tools, irrespective of
team structures.
Advantages of Programming in Data Science
Reproducibility:
Writing code enables the ability to rerun it whenever data changes, ensuring consistent results over time.
Connects with version control, maintaining a single file with a comprehensive history, avoiding the need for constant file renaming.
Flexibility:
Essential skill: Translating business situations into data questions, finding data answers, and delivering actionable insights.
Example: Answering questions like "Why are customers leaving?" requires deducing solutions without predefined tools.
Email: [email protected]
LinkedIn: Jackson Marube
Hope the insights were valuable to you.