Data Science in Business
Data Science in Business
Big data is the large volume of data that fast generating and inundates a business on a
day-to-day basis, and basically, it’s hard to manage by using IT infrastructure. There
is no threshold to define the volume of big data which vary periodically. Refer to
Appendix 4, 80% of big data is unstructured, but anyhow, a data scientist could utilize
either structured or unstructured data to accomplish the data science task and solve a
business problem. Businesses are using big data analytics in term to keep track of
primary transactions and run business more efficiently by making better decisions
(Kenneth C. & Jane P., 2020). MNCs generated (daily basis): Facebook (0.5
petabytes); Walmart (40 petabytes); Google (24 petabytes) where 1 petabyte (PT) =
1,000 terabytes (TB) = 106 gigabytes (GB). Businesses might face challenges with
significant big data, e.g., a company plan to generate 40PT that needs 4,000 drives
($200/pcs), cost $0.8million + tax, and other accessories budget; hardware supply lead
time; time consumption of transferring data into the hard drive (12.6 years: store
40PT: 100MB/sec speed); spaces to store hard drives, etc. must put all in
consideration.
How could we reframing a business problem as a data science question and approach
the analytics problem in a more in-depth discussion? First, we must select the
framework that we want to work on it from either “Data Analytic Lifecycle” or
“Microsoft Team Data Science Process (TDSP)” (refer to Appendix 5 for own
reference). We will discuss further on Data Analytic Lifecycle (refer to Appendix 6),
which contains six steps:
1) Discovery
2) Data Preparation
Establish the analytic sandbox and ETLT (extract, transform, load, and transform) the
data, followed by data exploration and conditioning (remove outliers/ missing data),
and summarize and visualize the data. First, access the data by understanding each
data code, then proceed for visualization, which is vital before analyzing where data is
easier for understanding and communication (refer to Appendix 7A & 7B).
Furthermore, analyze the useful validated data, impute missing data, summarize
findings (root cause) and prescribe it.
3) Model Planning
Select the suitable model after data analysis to solve the unique business problems.
There is a various category of techniques in model selection (refer to Appendix 8).
4) Model Building
Build training and test datasets, where 80% (labeled data) is for training, while 20%
(unlabeled data) is for testing only, which not show in the model. After setting up the
model, we will then train the selected model by evaluating the fitted model and
adjusting accordingly to get accurate results.
1
5) Communicate Results
Data scientists prepare different presentations (to show their findings, predictions, and
recommendations to solve the business problem) to the top management, analyst, and
responses.
6) Operationalize
Operationalize the model by providing the code and technical documentation to the
respective department for further deployment after communication approval.
Lifecycle is considered complete but is not ending where we must further monitor for
refinement that new attributes can be considered, and delivery mechanism can be
simplified with self-service reporting.
Conclusion
Data science combines the scientific method, math and statistics, etc. and even
storytelling to uncover and explain the business insights buried in data (What Is Data
Science?, 2020). Big data can source from communications, media, entertainment,
financial services, the internet of things (IoT) etc. where it’s exposure to every
business nowadays, and it demonstrated four dimensions that can be analyzed for
insights that lead to better decisions and strategic business. It’s added value to the
company and created valuable information to the customer. Thusly, data science does
matter in businesses to analyze and predict whether the company is growing or falling
shortly by leveraging data analytic lifecycle to figure out how to reduce the customer
churn and acquire new customers.
2
Appendix:
3
Appendix 2: Evidence of attending webinar.
(My zoom-in ID: Vivian Hoo as shown at the top right participants screen).
4
Appendix 4: Hierarchy of Big Data.
5
Appendix 6: Data Analytic Lifecycle.
Appendix 7A: Data access direct from the IT department (only can understand by
authorized parties).
6
Appendix 7B: Visualization (data decryption) by data scientist (easy for under-
standing by every party).
7
Kenneth C., L., & Jane P., L. (2020). Management Information Systems (Managing
the Digital Firm) (16th ed.). Pearson.
What is Data Science? (2020, June 3). https://www.ibm.com/cloud/learn/data-
science-introduction
Woods, D. (2011, October 7). Amazon’s John Rauser on “What Is a Data Scientist?”
Forbes. https://www.forbes.com/sites/danwoods/2011/10/07/amazons-john-rauser-on-
what-is-a-data-scientist/