Assignment 2 - Frontsheet - Business Process Support
Assignment 2 - Frontsheet - Business Process Support
Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I
understand that making a false declaration is a form of malpractice.
Grading grid
P5 P6 P7 M3 M4 D2 D1
❒ Summative Feedback: ❒ Resubmission Feedback:
Furthermore, ABC Manufacturing utilizes real-time data from sensors and IoT devices installed in its
production facilities and logistics network. These devices collect data on equipment performance,
energy consumption, and transportation routes. By monitoring and analyzing this data, the
organization can identify bottlenecks, optimize production processes, and streamline logistics
operations. For example, if a particular machine shows signs of malfunction through real-time data
analysis, proactive maintenance can be scheduled to prevent costly breakdowns and production
delays.
Data and information also play a crucial role in ensuring product quality and compliance with
industry standards. ABC Manufacturing collects data at various stages of the production process,
including quality control checkpoints and post-sales customer feedback. By analyzing this data, the
organization can identify potential quality issues, implement corrective actions, and continuously
improve its manufacturing processes. This not only enhances product quality but also reduces the
risk of product recalls and associated costs.
Another implication of using data and information in ABC Manufacturing's supply chain is the ability
to collaborate effectively with suppliers and distributors. Through the integration of data systems
with key partners, the organization gains real-time visibility into inventory levels, production
schedules, and transportation status. This enables proactive coordination and timely decision-
making, resulting in improved order fulfillment, reduced lead times, and enhanced overall supply
chain performance.
II. Contents
1. Tools and technologies, support business processes and inform decision making.
1.1. Exploration of tools and technologies associated with Data Science
1.1.1. Programming language
• Python
• R
• SQL
• Java
• Julia
• Scala
• C/C++
• JavaScript
• Swift
• Go
• MATLAB
• SAS
This initial phase involves collecting data from various sources, such as databases, Excel files, text files, APIs,
web scraping, or even real-time data streams. The type and volume of data collected largely depend on the
problem you’re addressing.
Once collected, this data is stored in an appropriate format ready for further processing. Storing the data
securely and efficiently is important to allow quick retrieval and processing.
Data governance promotes the availability, quality and security of an organization’s data through
different policies and standards. These processes determine data owners, data security measures
and intended uses for the data.
Drive scale and data literacy – Limited data access across an organization can limit innovation
and create dependencies on subject matter experts (SMEs) within business processes. Data
governance practices create a pathway for cross-functional teams to come together to create
a shared understanding of data across systems (e.g. reconciling differences domain-agnostic
data). This shared understanding can then manifest itself through data standards, where data
definitions and metadata are documented in a centralized place, such as a data catalog. This
documentation, in turn, becomes the foundation for self-service solutions, such as APIs,
which enable consistent data and federated access to it across an organization.
Ensure security, data privacy and compliance – Data governance policies provide a way to
meet the demands of government regulations regarding sensitive data and privacy, such as
the EU General Data Protection Regulation (GDPR), the US Health Insurance Portability and
Accountability Act (HIPAA), and industry requirements such as Payment Card Industry Data
Security Standards (PCI DSS). Violations of these regulatory requirements can result in costly
government fines and public backlash. To avoid this, companies adopt data governance tools
to set guardrails, which prevent against data breaches and the misuse of data.
High-quality data – Data governance ensures data integrity, data accuracy, completeness and
consistency. Good data allows companies to better understand their workflows and
customers as well as how to optimize their overall business performance. However, errors in
the performance metrics can steer an organization in the wrong direction, but data
governance tools can address potential inaccuracies. For example, data lineage tools can help
data owners trace data through its lifecycle; this includes any source information or data
transformations that have been applied during any ETL or ELT processes. This enables close
inspection of the root cause of any data errors.
Promote data analytics – Quality data lays the foundation for more advanced data analytics
and data science initiatives; this can include business intelligence reporting or more complex
predictive machine learning projects. These can only be prioritized when its main
stakeholders trust the underlying data; otherwise, they may not be adopted.
- Challenge of Data Governance
Although the benefits of data governance are clear, data governance initiatives have a number of
hurdles to overcome to achieve success. Some of these challenges include:
Organizational alignment: At the onset of a data governance program, one of the largest
challenges will be to align stakeholders across the organization around what the key data
assets are and what their respective definitions and formats should be. Regulatory policies
can put some structure around conversations around customer data, but it may be more
difficult to agree on other datasets that fall under master data management (MDM), such as
more product-specific data.
Lack of appropriate sponsorship: Good data governance programs generally require
sponsorship at two levels—the executive level and the individual contributor level. Chief Data
Officers (CDOs) and data stewards are critical in the communication and prioritization of data
governance within an organization. The Chief Data Officer can provide oversight and enforce
accountability across data teams to ensure that data governance policies are adopted. Data
stewards can help promote awareness of these policies to data producers and data
consumers to encourage compliance across the organization.
Relevant data architecture and processes- Without the right tools and data architecture,
companies will struggle in their deployment of an effective data governance program. As an
example, teams may discover redundant data across different functions, but data architects
will need to develop appropriate data models and data architectures to merge and integrate
data across storage systems. Teams may also need to adopt a data catalog to create an
inventory of data assets across an organization, or if they already have one, they may need to
set up a process for metadata management, which ensures that the underlying data are
relevant and up-to-date.
1.1.6. Deployment and Monitoring
- What is monitoring?
Monitoring is the practice of collecting, processing, aggregating, and visualizing real-time quantitative
data about a system. In the context of an application, the measured data might include request
counts, error counts, request latency, database query latency, and resource utilization.
For example, suppose you were developing new search functionality for an application and
introduced a new API endpoint that queries. You might be interested in measuring the amount of
time taken to serve such search requests and track how it performs when the concurrent load on
that endpoint increases. You might then discover that the latency increases when users search
specific fields due to a missing index. Monitoring can help you detect such anomalies or performance
bottlenecks.
There are several reasons why monitoring is important – understanding the reasons informs your
choices regarding implementation and choice of tools. From a high level, monitoring helps you to
ensure the reliable operation of your application.
Web applications tend to grow in complexity over time. Even supposedly simple apps can be
cumbersome to understand once deployed when considering how they'll function under load.
Moreover, layers of abstraction and external libraries' usage obscure the app's underlying mechanics
and failure modes. Monitoring provides you with x-ray-like vision into the health and operation of
your application.
1.2. Discuss how tools and technologies support to the Business process and Inform decision
1.2.1. Data Collection and Integration
- What is data integration?
Data integration refers to the process of combining and harmonizing data from multiple sources into
a unified, coherent format that can be put to use for various analytical, operational and decision-
making purposes.
Data integration involves a series of steps and processes that brings together data from disparate
sources and transforms it into a unified and usable format. Here's an overview of how a typical data
integration process works:
Data source identification: The first step is identifying the various data sources that need to
be integrated, such as databases, spreadsheets, cloud services, APIs, legacy systems and
others.
Data extraction: Next, data is extracted from the identified sources using extraction tools or
processes, which might involve querying databases, pulling files from remote locations or
retrieving data through APIs.
Data mapping: Different data sources may use different terminologies, codes or structures to
represent similar information. Creating a mapping schema that defines how data elements
from different systems correspond to each other ensures proper data alignment during
integration.
Data validation and quality assurance: Validation involves checking for errors, inconsistencies
and data integrity issues to ensure accuracy and quality. Quality assurance processes are
implemented to maintain data accuracy and reliability.
Data transformation: At this stage, the extracted data is converted and structured into a
common format to ensure consistency, accuracy and compatibility. This might include data
cleansing, data enrichment and data normalization.
Data loading: Data loading is where the transformed data is loaded into a data warehouse or
any other desired destination for further analysis or reporting. The loading process can be
performed by batch loading or real-time loading, depending on the requirements.
Data synchronization: Data synchronization helps ensure that the integrated data is kept up
to date over time, whether via periodic updates or real-time synchronization if immediate
integration of newly available data is required.
Data governance and security: When integrating sensitive or regulated data, data governance
practices ensure that data is handled in compliance with regulations and privacy
requirements. Additional security measures are implemented to safeguard data during
integration and storage.
Metadata management: Metadata, which provides information about the integrated data,
enhances its discoverability and usability so users can more easily understand the data’s
context, source and meaning.
Data access and analysis: Once integrated, the data sets can be accessed and analyzed using
various tools, such as BI software, reporting tools and analytics platforms. This analysis leads
to insights that drive decision making and business strategies.
1.2.2. Data Exploration and Visualization
- What is data visualization?
Data visualization is the representation of information and data using charts, graphs, maps, and other
visual tools. These visualizations allow us to easily understand any patterns, trends, or outliers in a
data set.
Data visualization also presents data to the general public or specific audiences without technical
knowledge in an accessible manner. For example, the health agency in a government might provide a
map of vaccinated regions.
The purpose of data visualization is to help drive informed decision-making and to add colorful
meaning to an otherwise bland database.
Data visualization can be used in many contexts in nearly every field, like public policy, finance,
marketing, retail, education, sports, history, and more. Here are the benefits of data visualization:
Storytelling: People are drawn to colors and patterns in clothing, arts and culture,
architecture, and more. Data is no different—colors and patterns allow us to visualize the
story within the data.
Accessibility: Information is shared in an accessible, easy-to-understand manner for a variety
of audiences.
Visualize relationships: It’s easier to spot the relationships and patterns within a data set
when the information is presented in a graph or chart.
Exploration: More accessible data means more opportunities to explore, collaborate, and
inform actionable decisions.
1.2.3. Process Optimization
- What is process optimization
Customer segmentation simply means grouping your customers according to various characteristics
(for example grouping customers by age).
It’s a way for organizations to understand their customers. Knowing the differences between
customer groups, it’s easier to make strategic decisions regarding product growth and marketing.
The opportunities to segment are endless and depend mainly on how much customer data you have
at your use. Starting from the basic criteria, like gender, hobby, or age, it goes all the way to things
like “time spent of website X” or “time since user opened our app”.
There are different methodologies for customer segmentation, and they depend on four types of
parameters:
geographic,
demographic,
behavioral,
psychological.
Geographic customer segmentation is very simple, it’s all about the user’s location. This can be
implemented in various ways. You can group by country, state, city, or zip code.
Demographic segmentation is related to the structure, size, and movements of customers over space
and time. Many companies use gender differences to create and market products. Parental status is
another important feature. You can obtain data like this from customer surveys.
Behavioral customer segmentation is based on past observed behaviors of customers that can be
used to predict future actions. For example, brands that customers purchase, or moments when they
buy the most. The behavioral aspect of customer segmentation not only tries to understand reasons
for purchase but also how those reasons change throughout the year.
Psychological segmentation of customers generally deals with things like personality traits, attitudes,
or beliefs. This data is obtained using customer surveys, and it can be used to gauge customer
sentiment.
Implementing customer segmentation leads to plenty of new business opportunities. You can do a
lot of optimization in:
budgeting,
product design,
promotion,
marketing,
customer satisfaction.
Budgeting: Nobody likes to invest in campaigns that don’t generate any new customers. Most
companies don’t have huge marketing budgets, so that money has to be spent right. Segmentation
enables you to target customers with the highest potential value first, so you get the most out of
your marketing budget.
Product design: Customer segmentation helps you understand what your users need. You can
identify the most active users/customers, and optimize your application/offer towards their needs.
Promotion: Properly implemented customer segmentation helps you plan special offers and deals.
Frequent deals have become a staple of e-commerce and commercial software in the past few years.
If you reach a customer with just the right offer, at the right time, there’s a huge chance they’re
going to buy. Customer segmentation will help you tailor your special offers perfectly.
Marketing: The marketing strategy can be directly improved with segmentation because you can plan
personalized marketing campaigns for different customer segments, using the channels that they use
the most.
Customer satisfaction: By studying different customer groups, you learn what they value the most
about your company. This information will help you create personalized products and services that
perfectly fit your customers’ preferences.
Fraud detection is a process that detects and prevents fraudsters from obtaining money or property
through false means. It is a set of activities undertaken to detect and block the attempt of fraudsters
from obtaining money or property fraudulently. Fraud detection is prevalent across banking,
insurance, medical, government, and public sectors, as well as in law enforcement agencies.
Hình 1
Tools and Technologies: Real-time analytics platforms, IoT devices, streaming data processing tools.
Support to Business Process: Real-time decision support tools provide immediate insights into
ongoing operations and market conditions. They allow businesses to react swiftly to changes and
opportunities.
Inform Decision: By having access to real-time data, businesses can make proactive and timely
decisions, enhancing their agility and responsiveness in a dynamic market environment.
1.3. Assess the benefits of using data science to solve problems in real-world scenarios
1.3.1. Data-Driven Decision Making
- What is data-driven decision making?
Data-driven decision-making (sometimes abbreviated as DDDM) is the process of using data to
inform your decision-making process and validate a course of action before committing to it.
Collect survey responses to identify products, services, and features their customers would
like
Conduct user testing to observe how customers are inclined to use their product or services
and to identify potential issues that should be resolved prior to a full release
Launch a new product or service in a test market in order to test the waters and understand
how a product might perform in the market
Analyze shifts in demographic data to determine business opportunities or threats
How exactly data can be incorporated into the decision-making process will depend on a number of
factors, such as your business goals and the types and quality of data you have access to.
The collection and analysis of data have long played an important role in enterprise-level
corporations and organizations. But as humanity generates more than 2.5 quintillion bytes of data
each day, it's never been easier for businesses of all sizes to collect, analyze, and interpret data into
real, actionable insights. Though data-driven decision-making has existed in business in one form or
another for centuries, it’s a truly modern phenomenon.
- Example
Leadership Development at Google
Real Estate Decisions at Starbucks
Driving Sales at Amazon
- Benefit of data-driven decision making
Once you begin collecting and analyzing data, you’re likely to find that it’s easier to reach a
confident decision about virtually any business challenge, whether you’re deciding to launch
or discontinue a product, adjust your marketing message, branch into a new market, or
something else entirely.
Data performs multiple roles. On the one hand, it serves to benchmark what currently exists,
which allows you to better understand the impact that any decision you make will have on
your business.
Beyond this, data is logical and concrete in a way that gut instinct and intuition simply aren’t.
By removing the subjective elements from your business decisions, you can instill confidence
in yourself and your company as a whole. This confidence allows your organization to commit
fully to a particular vision or strategy without being overly concerned that the wrong decision
has been made.
Just because a decision is based on data doesn’t mean it will always be correct. While the
data might show a particular pattern or suggest a certain outcome, if the data collection
process or interpretation is flawed, then any decision based on the data would be inaccurate.
This is why the impact of every business decision should be regularly measured and
monitored.
When you first implement a data-driven decision-making process, it’s likely to be reactionary
in nature. The data tells a story, which you and your organization must then react to.
While this is valuable in its own right, it’s not the only role that data and analysis can play
within your business. Given enough practice and the right types and quantities of data, it’s
possible to leverage it in a more proactive way—for example, by identifying business
opportunities before your competition does, or by detecting threats before they grow too
serious.
There are many reasons a business might choose to invest in a big data initiative and aim to
become more data-driven in its processes. According to a recent survey of Fortune 1,000
executives conducted by NewVantage Partners for the Harvard Business Review, these
initiatives vary in their rates of success.
One of the most impactful initiatives, according to the survey, is using data to decrease
expenses. Of the organizations which began projects designed to decrease expenses, more
than 49 percent have seen value from their projects. Other initiatives have shown more
mixed results.
“Big data is already being used to improve operational efficiency,” said Randy Bean, CEO and
managing partner of consultancy firm NewVantage Partners, when announcing the results of
the survey. “And the ability to make informed decisions based on the very latest up-to-the-
moment information is rapidly becoming the mainstream norm.”
1.3.2. Improved Efficiency and Productivity
- Definition
Productivity measures rates of production. It refers to the number of goods (or output) you can
produce in a given time frame.
Efficiency measures how many resources you need to complete a given task. In many cases, you’ll
reference time when discussing efficiency, as you want to understand how much time it takes to
complete the task.
One of the most important benefits of personalized CX is that it increases customer satisfaction and
loyalty. Making a customer feel important and valued is a great way to turn them into a repeat buyer,
and personalization is perfectly designed to do that. According to an Adobe Commerce survey, 67%
of consumers say they want personalized offers based on their individual spending habits, whether
they’re shopping online or in store. Nailing personalization is an opportunity to meet and exceed
expectations.
Personalization also builds trust by showing you care about the customer, and they can have
confidence in you to support their needs. Repeat on-target personalization is a quick and effective
way to build brand trust and loyalty.
In addition, personalization increases engagement. Tailoring any point of contact along the customer
journey directly to individuals makes them more likely to interact with it. For example, one
benchmark report found a 139% increase in click rate for personalized emails compared with static,
one-time sends.
The result of this engagement, along with the increased customer lifetime value that comes from
building trust and loyalty, is a higher return on investment. It also helps you gain new customers.
According to a McKinsey study, 76% of consumers are more likely to purchase from a company that
personalizes.
Competitive advantage refers to the ways that a company can produce goods or deliver services
better than its competitors. It allows a company to achieve superior margins and generate value for
the company and its shareholders.
A competitive advantage is something that cannot be easily replicated and is exclusive to a company
or business. This value is created internally and is what sets the business apart from its competition.
Value proposition: A company must clearly identify the features or services that make it attractive to
customers. It must offer real value in order to generate interest.
Target market: A company must establish its target market to further engrain best practices that will
maintain competitiveness.
Competitors: A company must define competitors in the marketplace, and research the value they
offer; this includes both traditional as well as non-traditional, emerging competition.
The first line of defense in minimizing fraud risk is fraud prevention. Prevention is typically the most
cost-effective component of a fraud risk management system because it poses barriers to fraud,
deters fraud, and can eliminate the need for costly investigations.
Fraud prevention is implemented through preventive controls. These controls may derive from a
standards-based information security management system or a framework such as the Internal
Control – Integrated Framework by the Committee of Sponsoring Organizations of the Treadway
Commission (COSO). Such controls function as treatments for identified risks. As with all controls,
they must continuously be monitored for optimal effectiveness.
Fraud preventive controls include human resource (HR) procedures (e.g., job applicant background
investigations, anti-fraud training, employee evaluation and compensation programs). They could
also include IT controls (e.g., limiting access rights based on employee level and/or job requirements)
and operational controls (e.g., segregation of duties, authority limits, and transaction level
procedures).
To be successful, a fraud prevention program will be carefully documented, integrated into the
organization’s fraud management effort, and continuously monitored and improved. Employees at all
levels of the organization should be aware of the relevant program policies and procedures, and
trained as needed.
- Fraud detection
Fraud can never be fully prevented; therefore, a highly effective fraud detection system must be in
place to detect frauds as they occur.
In the same way that the fraud prevention system requires preventive controls, the fraud detection
system requires detective controls.
Detective controls are generally matched with identified risks, and they tend to be clandestine. In
some cases, it may be more cost effective to implement controls to detect rather than prevent fraud.
Further, detective controls can have a preventive effect through deterrence.
One of the most important fraud detection controls is a whistle-blower hotline. Such hotlines are
mandated by the Sarbanes–Oxley Act for U.S. listed firms and are generally the most likely means of
detecting fraud. The Association of Certified Fraud Examiners (ACFE) 2018 Report to the National on
Occupational Fraud and Abuse found that “while tips were the most common detection method
regardless of whether a hotline was in place, schemes were detected by tip in 46% of cases at
organizations that had hotlines, but in only 30% of cases at organizations without them.”
be promoted.
provide for anonymity (or at least confidentiality) of the whistle-blower.
provide for reporting to senior management or the audit committee.
work under a single case management system.
be continually reviewed for effectiveness by an independent evaluator.
Fraud detection is also enhanced by process controls. Such controls are designed to detect both
fraud and errors, and include
reconciliations,
independent reviews,
physical counts and inspections,
analyses, and
audits.
Specific controls should be implemented, along with proactive fraud detection procedures that
include data analysis, continuous auditing, and other supporting technologies.
As with all other components of the fraud risk management system, fraud detection processes and
techniques must be carefully documented for optimal effectiveness. Documentation should generally
exist for all detection controls and processes and should specifically exist for monitoring processes
and results; for testing procedures used to assess controls; and for the roles and responsibilities that
support fraud detection.
Continuous monitoring of fraud detection is essential. The organization should develop ongoing
monitoring and measurements to evaluate, remedy, and improve the organization’s fraud prevention
and detection techniques.
All violations of the organization’s code of conduct should be reported and dealt with in a timely
manner. Appropriate punishment should be applied, even if senior management is involved.
Efficient fraud detection systems can detect not only fraud but also waste and inefficiency.
In many cases, the fraud detection system simply raises a red flag. It is then necessary to follow up
with an investigation to determine the underlying issue. For example, a fraud detection system might
flag irregular production orders on the basis of the excess amounts of materials being applied to
particular work in process jobs. A follow-up of these irregular orders might then lead to finding either
waste or fraud. In cases of waste or inefficiency, firm leaders have an opportunity to boost
profitability by fixing these areas.
Creating and implementing a sound planning, budgeting and forecasting process helps organizations
establish more accurate financial report and analytics — potentially leading to more accurate
forecasting and ultimately revenue growth. Its importance is even more relevant in today’s business
environment where disruptive competitors are entering even the most tradition-bound industries.
When companies embrace data and analytics in conjunction with well-established planning and
forecasting best practices, they enhance strategic decision making and can be rewarded with more
accurate plans and more timely forecasts. Overall, these tools and practices can save time, reduce
errors, promote collaboration and foster a more disciplined management culture that delivers a true
competitive advantage.
Innovation and creativity are often used synonymously. While similar, they're not the same. Using creativity in
business is important because it fosters unique ideas. This novelty is a key component of innovation.
For an idea to be innovative, it must also be useful. Creative ideas don't always lead to innovations because
they don't necessarily produce viable solutions to problems.
- Types of innovation
Innovation in business can be grouped into two categories: sustaining and disruptive.
2. Design a data science solution to support decision making related to a real-world problem.
2.1. Problems encountered by ABC Manufacturing when collecting data
Data Fragmentation: Data is scattered across different systems and formats, making it difficult
to consolidate and analyze.
Data Quality Issues: Inconsistent data entry, missing values, and errors reduce data reliability.
Real-time Data Collection: Difficulty in collecting and processing data in real-time due to
outdated infrastructure.
Scalability: Existing systems struggle to handle the increasing volume and variety of data.
Integration with Legacy Systems: Challenges in integrating new data collection tools with
existing legacy systems.
Security and Compliance: Ensuring data security and compliance with industry regulations.
2.2. Data science solutions to support decision making
To address these challenges and support decision-making, ABC Manufacturing can implement the
following data science solutions:
Data Integration and ETL Processes: Develop ETL (Extract, Transform, Load) pipelines to
consolidate data from multiple sources into a centralized data warehouse.
Data Cleaning and Preprocessing: Use data cleaning techniques to handle missing values,
remove duplicates, and correct errors.
Real-time Data Processing: Implement stream processing frameworks like Apache Kafka and
Apache Flink for real-time data collection and processing.
Scalable Data Storage: Use scalable storage solutions such as cloud-based data warehouses
(e.g., Amazon Redshift, Google BigQuery).
Machine Learning Models: Apply machine learning algorithms for predictive maintenance,
quality control, and demand forecasting.
Data Visualization: Use tools like Tableau, Power BI, or custom dashboards to visualize data
and extract insights.
Integration with IoT Devices: Connect IoT devices to collect real-time data from
manufacturing equipment.
Data Governance and Security: Implement data governance policies and use encryption,
access controls, and compliance checks to ensure data security.
2.3. Apply data science tools to solve problems encountered when collecting data for
ABCManufacturing
The overall architecture of a data science solution for ABC Manufacturing typically includes the
following components:
Data Sources: ERP systems, IoT sensors, databases, external data sources.
Data Ingestion Layer: Tools for data extraction and loading (ETL tools).
Data Storage Layer: Scalable storage solutions (cloud-based data warehouses, data lakes).
Data Processing Layer: Batch and real-time data processing frameworks.
Machine Learning and Analytics Layer: Platforms and tools for developing and deploying
machine learning models.
Data Visualization and Reporting Layer: Tools for creating dashboards and reports.
Data Governance and Security Layer: Policies, tools, and frameworks for data governance and
security.
- Data Sources:
ERP Systems
IoT Sensors
External Data Sources
- Data Ingestion Layer:
ETL Tools: Apache Nifi, Talend
- Data Storage Layer:
Cloud Data Warehouse: Amazon Redshift, Google BigQuery
- Data Processing Layer:
Batch Processing: Apache Spark
Real-time Processing: Apache Kafka, Apache Flink
- Machine Learning and Analytics Layer:
Machine Learning Platforms: TensorFlow, Scikit-learn
- Data Visualization and Reporting Layer:
Visualization Tools: Tableau, Power BI
- Data Governance and Security Layer:
Data Governance Tools: Apache Ranger
Security Tools: AWS IAM, Azure Active Directory
Figure 2
3. Implement a data science solution to support decision making related to a real-world problem.
3.1. Data cleaning and preprocessing
3.1.1. Pandas lib
Hình 2
Hình 3
Correcting Errors:
Hình 4
Transforming Data:
Hình 5
3.2. Visualisation
- Import Matplotlib
Visualize the cleaned data
Hình 6
Hình 7
3.3.2. Sklearn
- Data Preparation
Data Cleaning: Use pandas in conjunction with scikit-learn for handling missing values,
encoding categorical variables, and other preprocessing tasks.
Feature Selection: Utilize sklearn's feature selection techniques to choose the most relevant
features.
Scaling and Normalization: Normalize or standardize your data using StandardScaler,
MinMaxScaler, or other scalers.
Hình 9
- Model selection
Choosing the Right Model: Depending on the problem (classification, regression, clustering), select
appropriate models from sklearn.
Cross-Validation: Use cross-validation to ensure your model generalizes well to unseen data.
Hình 10
Hình 11
- Feature Engineering
Create Time-Based Features: Extract year, month, day, and other time-based features from
the date.
Lag Features: Create lag features to capture previous sales values.
Hình 12
3.4. Make justified recommendations that support decision making related to a real-world
problem.
3.4.1. Linear Regression: Overview about what is Linear Regression which libraries do you use to
make Linear Regression.
Linear Regression is a statistical method to model and analyze the relationships between a
dependent variable and one or more independent variables. The sklearn library is commonly used for
implementing linear regression models in Python.
3.4.2. Machine learning and Line chart : Overview about Machine learning and Line chart which
libraries do you use to make Line charts.
Machine Learning involves algorithms and statistical models to perform specific tasks without using
explicit instructions. It relies on patterns and inference. For visualization, line charts are useful to
show trends over time. Libraries like matplotlib and seaborn are used for creating line charts in
Python.
4. Evaluate the use of data science techniques against user and business requirements of an
identified organisation.
4.1. Evaluation Against User Requirements
Understanding User Requirements:
Accessibility: Users need intuitive interfaces and dashboards for interacting with data science
models.
Accuracy: High accuracy in predictions and recommendations is crucial to maintain user trust.
Timeliness: Users need real-time or near-real-time data analysis and results.
Interpretability: Users need clear explanations of the models and their outputs to make
informed decisions.
Customization: Flexibility to adjust and fine-tune models according to specific needs.
Evaluation:
Accessibility: Evaluate if the data science solution offers user-friendly tools like dashboards,
APIs, and visualization tools. Tools like Tableau, Power BI, and custom web applications can
be assessed for their ease of use.
Accuracy: Analyze model performance metrics (e.g., precision, recall, F1 score, RMSE) to
ensure high accuracy. Regularly validate model outputs against actual outcomes.
Timeliness: Check the data processing and prediction speed. Techniques like stream
processing and optimized algorithms (e.g., real-time ML models) should be evaluated.
Interpretability: Evaluate if the models are interpretable using methods like SHAP (SHapley
Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations).
Customization: Assess the flexibility of the models to be tailored to specific user needs
through parameters, configurations, and modular architecture.
4.2. Evaluation Against Business Requirements
Return on Investment (ROI): The solution should deliver measurable financial benefits.
Scalability: The ability to handle growing amounts of data and increased numbers of users.
Integration: Compatibility with existing systems and workflows.
Security and Privacy: Ensuring data protection and compliance with regulations.
Innovation: Enabling the business to stay competitive by leveraging advanced analytics and
insights.
Evaluation:
ROI: Calculate the financial benefits of implementing the data science solution, including cost
savings, revenue increase, and efficiency improvements. Tools like ROI calculators and
financial models can be used.
Scalability: Evaluate the architecture and technologies (e.g., cloud services, distributed
computing) to ensure they can scale with business growth.
Integration: Assess the compatibility of the solution with existing IT infrastructure, including
databases, ERP systems, and other enterprise software. APIs and middleware solutions should
be considered.
Security and Privacy: Review data security measures, encryption, access controls, and
compliance with regulations like GDPR, CCPA, and HIPAA.
Innovation: Measure the impact of data science on business innovation by tracking new
products, services, and market opportunities enabled by advanced analytics.
4.3. Example evaluation of the use of data science techniques in predictive maintenance for
equipment optimization
Scenario:
ROI: Track and quantify the reduction in unplanned downtime and maintenance costs,
showing a clear ROI from the predictive maintenance system.
Scalability: Ensure the solution can handle data from a growing number of sensors and
equipment as the business expands.
Integration: Integrate the predictive maintenance system with existing CMMS (Computerized
Maintenance Management System) and ERP systems for seamless operations.
Security and Privacy: Protect sensor data and maintenance records with robust security
measures, ensuring compliance with industry standards.
Innovation: Enable the business to explore new maintenance strategies and technologies,
such as remote diagnostics and automated maintenance scheduling.
III. Conclusion
ABC Manufacturing, a leading multinational in consumer electronics, exemplifies how leveraging data
and information can transform supply chain management. Faced with the complexities of efficiently
managing a global supply chain, the organization has adeptly utilized data-driven approaches to
overcome these challenges and drive substantial operational improvements.
The ability to forecast demand with high precision stands out as a critical advantage gained from data
analysis. By examining historical sales patterns, market trends, and customer preferences, ABC
Manufacturing has optimized production and inventory management, reducing stockouts and excess
inventory while enhancing resource utilization. This data-driven foresight has led to cost savings and
heightened customer satisfaction.
Real-time data collection from sensors and IoT devices further amplifies the company's operational
efficiency. The insights derived from equipment performance and logistics data enable the
identification of bottlenecks and optimization of production processes. Proactive maintenance,
driven by real-time analytics, helps prevent costly breakdowns and minimizes production delays,
ensuring smoother operations.
Ensuring product quality and compliance through data is another significant achievement for ABC
Manufacturing. By monitoring data at every stage of the production process, the company can swiftly
address quality issues and continually refine manufacturing practices. This commitment to quality
not only enhances product reliability but also mitigates the risks and costs associated with product
recalls.
Moreover, the integration of data systems with suppliers and distributors has fostered effective
collaboration. Real-time visibility into inventory, production schedules, and transportation statuses
has improved order fulfillment and reduced lead times. This synergy among partners has
strengthened the overall supply chain performance, underscoring the value of data-driven decision-
making.
In conclusion, ABC Manufacturing’s strategic use of data and information has markedly enhanced its
supply chain efficiency. By embracing data analytics, real-time monitoring, and collaborative
technologies, the organization has navigated its supply chain challenges adeptly, achieving
operational excellence and setting a benchmark for industry practices.
Luna, J.C. (2023). Learn R, Python & Data Science Online. [online] www.datacamp.com. Available at:
https://www.datacamp.com/blog/top-programming-languages-for-data-scientists-in-2022
Mathur, G. (2023). Data science vs. machine learning: What’s the difference? [online] IBM Blog.
Available at: https://www.ibm.com/blog/data-science-vs-machine-learning-whats-the-difference/
IBM (2023). What is data governance? | IBM. [online] www.ibm.com. Available at:
https://www.ibm.com/topics/data-governance
Prisma. (n.d.). What is monitoring, how to monitor, and tools for monitoring. [online] Available at:
https://www.prisma.io/blog/monitoring-best-practices-monitor5g08d0b
IBM (n.d.). What is Data Integration? | IBM. [online] www.ibm.com. Available at:
https://www.ibm.com/topics/data-integration
Coursera (2023). Data Visualization: Definition, Benefits, and Examples. [online] Coursera. Available
at: https://www.coursera.org/articles/data-visualization
Carter, T.J. (n.d.). How to Improve Daily Operations With Process Optimization. [online] learn.g2.com.
Available at: https://learn.g2.com/process-optimization
Kumar, D. (2021). Implementing Customer Segmentation Using Machine Learning [Beginners Guide].
[online] neptune.ai. Available at: https://neptune.ai/blog/customer-segmentation-using-machine-
learning
Kanade, V. (2021). What Is Fraud Detection? Definition, Types, Applications, and Best Practices.
[online] Spiceworks. Available at: https://www.spiceworks.com/it-security/vulnerability-
management/articles/what-is-fraud-detection/
blog.hubspot.com. (n.d.). Productivity vs. Efficiency: How To Improve Both at Work. [online] Available
at: https://blog.hubspot.com/sales/productivity-vs-efficiency
Lindecrantz, E., Gi, M.T.P. and Zerbi, S. (2020). Personalized experience for customers: Driving
differentiation in retail | McKinsey. [online] Mckinsey. Available at:
https://www.mckinsey.com/industries/retail/our-insights/personalizing-the-customer-experience-
driving-differentiation-in-retail
Peterdy, K. (2023). Competitive Advantage. [online] Corporate Finance Institute. Available at:
https://corporatefinanceinstitute.com/resources/management/competitive-advantage/
Kaufman Rossin Multisite Website. (n.d.). Mitigating Risk Through Fraud Prevention and Detection -
CPA & Advisory Professional Insights. [online] Available at:
https://kaufmanrossin.com/blog/mitigating-risk-fraud-prevention-detection/
IBM (2019). Planning, Budgeting and Forecasting. [online] IBM. Available at:
https://www.ibm.com/topics/planning-budgeting-and-forecasting
Boyles, M. (2022). Innovation in Business: What It Is & Why It’s so Important. [online] Harvard
Business School. Available at: https://online.hbs.edu/blog/post/importance-of-innovation-in-
business.
IBM (2019). Planning, Budgeting and Forecasting. [online] IBM. Available at:
https://www.ibm.com/topics/planning-budgeting-and-forecasting
V. Link Github