Midterm Report 1
Midterm Report 1
MIDTERM REPORT
NEW INFORMATION COMMUNICATION TECHNOLOGY (E)
1
3.3. Application of big data in business ................................................................. 17
REFERENCES ....................................................................................................... 19
LIST OF PICTURES
Picture 1.1 Growth of and Digitization of Global Information Storage Capacity ......... 3
Picture 1.2 The growth of big data's primary characteristics of volume, velocity, and
variety ...................................................................................................................... 5
Picture 1.3 A brief history of big data ........................................................................ 7
LIST OF FIGURES
Figure 2.1 Benefits of Big Data Analysis to Businesses (Wielki, 2013) .................... 11
Figure 2.2 2024 Digital Trends in Operations Survey by PwC .................................. 12
Figure 3.1 Demand In ICT Professional Of EU Country ........................................... 14
Figure 3.2 Top decision making based on Big Data by Bain research ....................... 15
2
CHAPTER I: INTRODUCTION TO BIG DATA
Volume relates to the amount of generated and stored data. The size of the
obtained data reflects its value as well as analytical opportunities which can be
3
considered in the process of further evaluation. Big data is usually explored in
data sets larger than terabytes and or even petabytes in size. With an increase in
data, organizations hold the possibility of discovering new patterns and
knowledge but accompanied by the need for efficient methods to handle this
data. [1]
Variety relates to the kind and character of information. Structural data, which is
typical for traditional applications, was processed without problem by previously
used technologies, such as Relational Database Management Systems
(RDBMSs). However, the growth of semi-structured and unstructured data
became a major problem for those existing tools and technologies. Big data
technologies now encompass techniques that are capable of collecting, storing
and processing different types of data including semi structured and unstructured
data and this data is generated at a very high velocity and in very large quantities.
These technologies can process text, image, audio and video, and complete
missing pieces with the help of data fusion. However, these tools can also handle
structured data; however, the key host purposes in these tools are more for
storage rather than processing, which is possible in case with conventional
RDBMS . This divergent data assists in investigating latent or potential
relationships from such forms of data as social media data, log data, sensor data,
etc. [1]
Velocity refers to the rate at which data is created and managed to address the
requirements and opportunities in growth and development. Big data is normally
available in real-time, and its generation is more constant when compared to that
of small data sets. The term velocity covers the time for data creation as well as
the rate of processing, collection, and reporting. The constant flow of data
requires fast processing so that the information reach can help in real-time
decision-making and response. [1]
4
Picture 1.2 The growth of big data's primary characteristics of volume, velocity, and
variety
1.3. History of big data:
Big data repositories have existed in many forms, often built by corporations with
a special need. Commercial vendors historically offered parallel database management
systems for big data beginning in the 1990s. For many years, WinterCorp published the
largest database report.
5
In 2000, Seisint Inc. (now LexisNexis Risk Solutions) developed a C++-based
distributed platform for data processing and querying known as the HPCC
Systems platform. This system automatically partitions, distributes, stores and
delivers structured, semi-structured, and unstructured data across multiple
commodity servers. Users can write data processing pipelines and queries in a
declarative dataflow programming language called ECL. Data analysts working
in ECL are not required to define data schemas upfront and can rather focus on
the particular problem at hand, reshaping data in the best possible manner as they
develop the solution. In 2004, LexisNexis acquired Seisint Inc. and their high-
speed parallel processing platform and successfully used this platform to
integrate the data systems of Choicepoint Inc. when they acquired that company
in 2008. In 2011, the HPCC systems platform was open-sourced under the
Apache v2.0 License.
CERN and other physics experiments have collected big data sets for many
decades, usually analyzed via high-throughput computing rather than the map-
reduce architectures usually meant by the current "big data" movement.
6
permutations of data sources, complexity in interrelationships, and difficulty in deleting
(or modifying) individual records.
Studies in 2012 showed that a multiple-layer architecture was one option to
address the issues that big data presents. A distributed parallel architecture distributes
data across multiple servers; these parallel execution environments can dramatically
improve data processing speeds. This type of architecture inserts data into a parallel
DBMS, which implements the use of MapReduce and Hadoop frameworks. This type
of framework looks to make the processing power transparent to the end-user by using
a front-end application server.
8
with metadata and referred to as a data lake could be too large, unstructured, or
diversified to fit into a data warehouse.
Storage: The storage layer is an essential part of any Big Data platform, where
data needs to be kept either before or after computational analysis or before or after
intake. Additional storage requirements may also be necessary for data during migration
and other scenarios, depending on your individual needs.
Processing and Analysis: This stage involves transforming data from its raw state
into a usable format - usually through sorting, aggregating, merging, and even applying
more advanced functions and algorithms. The resulting datasets are then stored for
further processing or prepared for consumption through data visualization tools and
intelligent business insights. [5]
Large volumes of data are referred to as big data, yet its raw form necessitates thorough
processing and cleansing. Once the data is gathered, it can be subjected to sophisticated
big data analysis procedures that yield a comprehensive overview. These big data
analysis techniques include. These big data analysis techniques include, for example:
Data mining uses techniques like anomaly detection and data clustering to
classify data and separate the pertinent information from the input data.
Using statistical algorithms on historical data from a company, predictive
analytics analyzes the future and helps identify opportunities and dangers.
By replicating the process of learning to build a hierarchy of algorithms and then
sifting through the data to identify more complex patterns, deep learning is a
potent technology that mimics human learning ability.
Utilization and Visualization: Thus, big data makes use of the organization's own
data to extract precise information from it. In a perfect world, customers would receive
their data as business information and have the choice to self-serve, allowing them to
evaluate it anyway they see fit. When the analysis is done in the prescription area, end
users can utilize the material findings produced by the system in the form of suggested
actions or hypothetical statistical "predictions" when the analysis is done in the
prediction category.
9
CHAPTER II: ADVANTAGE AND DISADVANTAGE OF BIG DATA
10
to big data. Consumer contact, feedback, and sentiment data are valuable insights that
can be utilized to assess the company's shortcomings, improve relationships with
customers, and handle future problems.
2.2. Disadvantage of big data:
Big Data is gradually confirming its relevance in different spheres of life and
activity, including in medicine, economics, marketing, and learning. Despite the
enormous potential of Big Data possibility, the possibility to use and manage large
quantities of data is not without problems. By going deeper into awareness of Big Data’s
disadvantages, one will be in a better position to understand it more comprehensively.
Privacy and security concerns: Multiple individuals generate massive volumes of
data, which, when collected and analyzed, have significant implications for the privacy
and security of the company's customers. Organizations must take proper measures to
ensure information confidentiality and adherence to the rules for protecting users'
private data set forth in the General Data Protection Regulation (GDPR) and the
California Consumer Privacy Act (CCPA), for example. According to PwC, 40% of
organizations perceive cyber attacks as a significant danger to their business. [6]
12
CHAPTER III: OPPOTURNITY, CHALLENGE AND APPLICATION OF BIG
DATA IN BUSINESS
13
Figure 3.1 Demand In ICT Professional Of EU Country
Enhanced Decision-Making: Big Data helps to transform vast amounts of raw
data into meaningful and easily absorbable insights that can be used to make better data-
driven decisions to improve overall performance in an organisation. Businesses can
utilise Big Data to predict future trends and patterns, streamline operations and services,
and help in strategic business planning. For instance, retail businesses can use Big Data
to predict the amount of stock that will be needed based on previous sales and current
market conditions with the help of advanced computer algorithms. With accurate data,
they can generate reports to determine what products to order, when to order and what
quantities. Such insights allow businesses to make better, more informed decisions that
increase overall efficiency and profitability.
14
Figure 3.2 Top decision making based on Big Data by Bain research
Efficient Resource Utilization: Another major advantage Big Data brings to the
table is efficient resource utilization. Resources are becoming scarcer or more expensive
and by integrating resource utilization solutions like Enterprise Resource Planning
(ERP) systems, organisations can better manage and utilise their resources for
maximum productivity with minimal costs.
Using Big Data to analyse and monitor waste points, companies can distribute their
resources better and exert greater control over their costs. For example, manufacturing
companies can use Big Data to track energy consumption and production output to
eliminate wastage and lower operating expenses.
3.2. Challenge of big data:
Data sources: Data comes from many places in lots of forms. Now, there are new
types of data, many of which are unstructured (e.g. pictures, sounds, videos, and
server logs, etc.). This data needs to be put into order using complex database
systems. Also, there might be problems ahead, such as analytics architecture,
evaluation, distributed mining, time evolving data, compression, visualization,
and hidden big data. [1]
15
Security risks: Databases may include confidential information related to the
government and people, so they need high levels of security policies and
mechanisms to protect this data against unauthorized use and malicious attacks.
Big data technologies today, including Cassandra and Hadoop, suffer from a lack
of sufficient security. You need to ensure that data is encrypted, so the data is
useless without an encryption key. Add identity and access authorization control
to all resources so only the intended users can access it. Implement endpoint
protection software so malware can't infect the system and real-time monitoring
to stop threats immediately if they are detected. [1]
Cost: The initial capital required to invest in ICT is very expensive and can not
be afforded by all the companies. There are several risks associated with
implementing the enterprise: the computers, software, hardware, and servers
must be purchased and maintained, and the employees must be properly trained,
which is both costly and time-consuming. However, the ever-evolving status of
the technological structure and the constant need to advance and keep up with
the latest security measures place additional pressure on this element. It is
especially so where the business needs to secure expensive financing that may
be difficult or costly for the Small and Medium-sized Enterprises to secure,
finance and manage.
• The role of government: The government should promote research and
development in the field of big data to ensure that they understand the potential
and risks of this technology. They should provide resources and infrastructure to
encourage and develop the use of big data in areas such as healthcare, education,
and business. The use of big data in government processes allows for increased
cost efficiency, productivity and innovation, but it is not without its flaws. Data
analytics often requires multiple government departments (central and local) to
collaborate and create new and innovative processes to deliver desired results.
For example, in 2012, President Obama's administration unveiled the Big Data
Research and Development Initiative, which aimed to investigate how big data
may be leveraged to tackle critical government challenges. The effort consists of
84 divisions.
16
3.3. Application of big data in business:
3.3.2 E-commerce:
Personalized Product Recommendations and Related Products: Many firms offer
recommendations or suggest purchases based on past purchases from the Internet
retailers. For instance, if a consumer is using an online clothing retailer, the platform
may suggest dresses after they bought pants. Likewise, when a consumer is interested
in a particular laptop on an electronics website, the site suggests an accompanying
peripheral for example a mouse, carrying bag, headphones amongst others. This kind
of customisation increases sales by notifying the buyer of other items that he might be
interested in. [1]
Supply Chain Optimization: The supply chain is an important aspect that e-
commerce stores have applied data analytics in determining the most effective one. For
example, an online shop that specializes in selling equipment for winter sports will be
17
able to use customer addresses to determine that such equipment will be popular among
people in regions with snow. They get the chance to order more winter goods into
warmer centers nearer to these areas prior to the winter conditions, so that they can help
in the speeding up of their shipment and improved stock flow.
Promotion code: An e-commerce business examines data to identify clients who
often buy electrical devices. They send a targeted email with a discount voucher to this
consumer, promoting their latest electrical items.
3.3.3 Healthcare:
Big data analytics has also improved healthcare by integrating and customizing
treatments and diagnosis, reducing the cost and time patients spend on consultations,
providing centralized reporting methods, and archiving internal and external health
information and patient data. It enhances the accuracy and individualization of
healthcare management and delivery, promotes the overall efficacy of the medical
supply system, ensures appropriate task and resource application, and facilitates
interprofessional and cross-organizational data exchange. To a large degree, it
contributes to maintaining improved illness outcomes, shifting from a curative and
reactive approach to a preventive and preemptive one, and making the whole healthcare
system more responsive. [1]
18
REFERENCES
19
MEMBER EVALUATION FORM
Task
Name Student ID Task Completion
Percentage
Creating a Powerpoint
Lê Phương Uyên K235022113 presentation 100%
Doing midterm report
Preparing content for the
how Big Data can be
used, opportunity and
Hồ Quang Vinh K234111459 100%
challenge section of the
presentattion
Doing midterm report
Creating a Powerpoint
Hồ Gia Hân K235022084 presentation 100%
Doing midterm report
20