0% found this document useful (0 votes)

135 views37 pages

BDM 1

Uploaded by

Nirmal Kumar V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views37 pages

BDM 1

Uploaded by

Nirmal Kumar V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

UNIT I UNDERSTANDING BIG

DATA
What is big data – why big data – convergence of key trends – unstructured data – industry
examples of big data – web analytics – big data and marketing – fraud and big data – risk and big
data – credit risk management – big data and algorithmic trading – big data and healthcare – big
data in medicine – advertising and big data – big data technologies – introduction to Hadoop –
open source technologies – cloud and big data – mobile business intelligence – Crowd sourcing
analytics – inter and trans firewall analytics.

WHAT IS BIG DATA?

Big data is a combination of structured, semistructured and unstructured data collected by

organizations that can be mined for information and used in machine learning projects, predictive
modeling and other advanced analytics applications.

The definition of big data is data that contains greater variety, arriving in increasing
volumes and with more velocity. This is also known as the three Vs.

Large amounts of different types of data produced from various types of sources, such as
people, machines or sensors.

Big data is often characterized by the three V's:

 The large volume of data in many environments;

 The wide variety of data types frequently stored in big data systems; and
 The velocity at which much of the data is generated, collected and processed.

Big data benefits:

 Big data makes it possible for you to gain more complete answers because you have more
information.
 More complete answers mean more confidence in the data—which means a completely
different approach to tackling problems.

How big data works

Big data gives you new insights that open up new opportunities and business models.
Getting started involves three key actions:

1. Integrate
Big data brings together data from many disparate sources and applications. Traditional data
integration mechanisms, such as extract, transform, and load (ETL) generally aren’t up to the
task. It requires new strategies and technologies to analyze big data sets at terabyte, or even
petabyte, scale.

During integration, you need to bring in the data, process it, and make sure it’s formatted and
available in a form that your business analysts can get started with.

1
2. Manage
Big data requires storage. Your storage solution can be in the cloud, on premises, or both. You
can store your data in any form you want and bring your desired processing requirements and
necessary process engines to those data sets on an on-demand basis. Many people choose their
storage solution according to where their data is currently residing. The cloud is gradually
gaining popularity because it supports your current compute requirements and enables you to
spin up resources as needed.

3. Analyze
Your investment in big data pays off when you analyze and act on your data. Get new clarity
with a visual analysis of your varied data sets. Explore the data further to make new discoveries.
Share your findings with others. Build data models with machine learning and artificial
intelligence. Put your data to work.

WHY IS BIG DATA IMPORTANT?

Companies use big data in their systems to improve operations, provide better customer
service, create personalized marketing campaigns and take other actions that, ultimately, can
increase revenue and profits. Businesses that use it effectively hold a potential competitive
advantage over those that don't because they're able to make faster and more informed business
decisions.

For example, big data provides valuable insights into customers that companies can use
to refine their marketing, advertising and promotions in order to increase customer engagement
and conversion rates. Both historical and real-time data can be analyzed to assess the evolving
preferences of consumers or corporate buyers, enabling businesses to become more responsive to
customer wants and needs.

Big data is also used by medical researchers to identify disease signs and risk factors and
by doctors to help diagnose illnesses and medical conditions in patients. In addition, a
combination of data from electronic health records, social media sites, the web and other sources
gives healthcare organizations and government agencies up-to-date information on infectious
disease threats or outbreaks.

Here are some more examples of how big data is used by organizations:

 In the energy industry, big data helps oil and gas companies identify potential drilling
locations and monitor pipeline operations; likewise, utilities use it to track electrical
grids.
 Financial services firms use big data systems for risk management and real-time analysis
of market data.
 Manufacturers and transportation companies rely on big data to manage their supply
chains and optimize delivery routes.
 Other government uses include emergency response, crime prevention and smart city
initiatives.

2
CONVERGENCE OF KEY TRENDS

The industry has an evolving definition around Big Data that is currently defined by three
dimensions:
1. Volume
2. Variety
3. Velocity
These are reasonable dimensions to quantify Big Data and take into account the typical measures
around volume and variety plus introduce the velocity dimension, which is a key compounding
factor. Let ’s explore each of these dimensions further.
Data volume can be measured by the sheer quantity of transactions, events, or amount of
history that creates the data volume, but the volume isoften further exacerbated by the attributes,
dimensions, or predictive variables.
Data variety is the assortment of data. Traditionally data, especially operationaldata, is
“structured” as it is put into a database based on the type of data (i.e., character, numeric, floating
point, etc.). Over the past couple of decades,data has increasingly become “unstructured” as the
sources of data have proliferatedbeyond operational applications.Oftentimes, text, audio, video,
image, geospatial, and Internet data (includingclick streams and log fi les) are considered
unstructured data .
Data velocity is about the speed at which data is created, accumulated,ingested, and
processed. The increasing pace of the world has put demandson businesses to process
information in real-time or with near real-timeresponses. This may mean that data is processed
on the fly or while “streaming”by to make quick, real-time decisions or it may be that monthly
batchprocesses are run interday to produce more timely decisions.

Types of Big Data

Big Data is essentially classified into three types:

 Structured Data
 Unstructured Data
 Semi-structured Data

Structured Data

Structured data is highly organized and thus, is the easiest to work with. Its dimensions are
defined by set parameters. Every piece of information is grouped into rows and columns like
spreadsheets. Structured data has quantitative data such as age, contact, address, billing,
expenses, debit or credit card numbers, etc.

Unstructured Data
Unstructured data is information that is not arranged according to a preset data model or
schema, and therefore cannot be stored in a traditional relational database or RDBMS. Text and
multimedia are two common types of unstructured content.
Examples of unstructured data include text, video files, audio files, mobile activity, social media
posts, satellite imagery, surveillance imagery

3
Semi-structured Data
Semi-structured data (also known as partially structured data) is a type of data that doesn't follow
the tabular structure associated with relational databases or other forms of data tables but does
contain tags and metadata to separate semantic elements and establish hierarchies of records and
fields.
Emails are semi-structured by Sender, Recipient, Subject, Date, and so on, or are automatically
classified into folders such as Inbox, Spam, Promotions, and so on, using machine learning.

UNSTRUCTURED DATA
Unstructured data is information that is not arranged according to a preset data model or
schema, and therefore cannot be stored in a traditional relational database or RDBMS. Text and
multimedia are two common types of unstructured content.

From 80% to 90% of data generated and collected by organizations is unstructured, and its
volumes are growing rapidly — many times faster than the rate of growth for structured
databases.

Unstructured data stores contain a wealth of information that can be used to guide business
decisions. However, unstructured data has historically been very difficult to analyze. With the
help of AI and machine learning, new software tools are emerging that can search through vast
quantities of it to uncover beneficial and actionable business intelligence.

What are some examples of unstructured data?

Unstructured data can be created by people or generated by machines.

Here are some examples of the human-generated variety:

 Email: Email message fields are unstructured and cannot be parsed by traditional
analytics tools. That said, email metadata affords it some structure, and explains why
email is sometimes considered semi-structured data.
 Text files:This category includes word processing documents, spreadsheets,
presentations, email, and log files.
 Social media and websites:data from social networks like Twitter, LinkedIn, and
Facebook, and websites such as Instagram, photo-sharing sites, and YouTube.
 Mobile and communications data:For this category, look no further than text messages,
phone recordings, collaboration software, chat, and instant messaging.
 Media:This data includes digital photos, audio, and video files.

Here are some examples of unstructured data generated by machines:

 Scientific data:This includes oil and gas surveys, space exploration, seismic imagery,
and atmospheric data.
 Digital surveillance:This category features data like reconnaissance photos and videos.
 Satellite imagery: This data includes weather data, land forms, and military movements.

4
INDUSTRY EXAMPLES OF BIG DATA

Big data is a term to describe the large amounts of data traveling into an enterprise day
upon day. However, it isn’t the volume of data which is important. It’s what organisations do
with the data which truly gives value to a business. Big data can be analysed for insights which
leads to better decisions and strategic business moves, and it is used across nearly all industries.
Below, are 6 examples of big data being used across some of the main industries in the UK.

1.) Retail

Good customer service and building customer relationships is vital in the retail industry.
The best ways to build and maintain this service and relationship is through big data analysis.
Retail companies need to understand the best techniques to market their products to their
customers, the best process to manage transactions and the most efficient and strategic way to
bring back lapsed customers in such a competitive industry.

2.) Banking

Due to the amount of data streaming into banks from a wide variety of channels, the
banking sector needs new means to manage big data. Of course like the retail industry and all
others it is important to build relationships, but banks must also minimise fraud and risk whilst at
the same time maintaining compliance.

3.) Manufacturing

Manufacturers can use big data to boost their productivity whilst also minimising wastage
and costs - processes which are welcomed in all sectors but vital within manufacturing. There
has been a large cultural shift by many manufacturers to embrace analytics in order to make
more speedy and agile business decisions.

4.) Education

Schools and colleges which use big data analysis can make large positive differences to
the education system, its employees and students. By analysing big data, schools are supplied
with the intel needed to implement a better system for evaluation and support of teachers, to
make sure students are progressing and identifying at risk pupils.

5.) Government

The Government has a large scope to make change to the community we live in as a
whole when utilising big data, such as dealing with traffic congestion, preventing crime, running
agencies and managing utilities. Governments however need to address the issues of privacy and
transparency.

6.) Health Care

Health Care is one industry where lives could be at stake if information isn’t quick,
accurate and in some cases, transparent enough to satisfy strict industry regulations. When big
data is analysed effectively, health care providers can uncover insights that can find new cures
and improve the lives of everyone.
5
WEB ANALYTICS

Web analytics is the process of analyzing the behavior of visitors to a website. This involves
tracking, reviewing and reporting data to measure web activity, including the use of a website
and its components, such as webpages, images and videos.

Data collected through web analytics may include traffic sources, referring sites, page views,
paths taken and conversion rates. The compiled data often forms a part of customer relationship
management analytics (CRM analytics) to facilitate and streamline better business decisions.

Web analytics enables a business to retain customers, attract more visitors and increase the dollar
volume each customer spends.

Analytics can help in the following ways:

 Determine the likelihood that a given customer will repurchase a product after purchasing
it in the past.
 Personalize the site to customers who visit it repeatedly.
 Monitor the amount of money individual customers or specific groups of customers
spend.
 Observe the geographic regions from which the most and the least customers visit the site
and purchase specific products.
 Predict which products customers are most and least likely to buy in the future.

The objective of web analytics is to serve as a business metric for promoting specific products to
the customers who are most likely to buy them and to determine which products a specific
customer is most likely to purchase. This can help improve the ratio of revenue to marketing
costs.

In addition to these features, web analytics may track the clickthrough and drilldown behavior of
customers within a website, determine the sites from which customers most often arrive, and
communicate with browsers to track and analyze online behavior. The results of web analytics
are provided in the form of tables, charts and graphs.

Web analytics process

The web analytics process involves the following steps:

1. Setting goals.The first step in the web analytics process is for businesses to determine
goals and the end results they are trying to achieve. These goals can include increased
sales, customer satisfaction and brand awareness. Business goals can be both quantitative
and qualitative.
2. Collecting data.The second step in web analytics is the collection and storage of data.
Businesses can collect data directly from a website or web analytics tool, such as Google
Analytics. The data mainly comes from Hypertext Transfer Protocol requests -- including
data at the network and application levels -- and can be combined with external data to
interpret web usage.
3. Processing data. The next stage of the web analytics funnel involves businesses
processing the collected data into actionable information.

6
4. Identifying key performance indicators (KPIs). In web analytics, a KPI is a
quantifiable measure to monitor and analyze user behavior on a website. Examples
include bounce rates, unique users, user sessions and on-site search queries.
5. Developing a strategy. This stage involves implementing insights to formulate strategies
that align with an organization's goals. For example, search queries conducted on-site can
help an organization develop a content strategy based on what users are searching for on
its website.
6. Experimenting and testing. Businesses need to experiment with different strategies in
order to find the one that yields the best results. For example, A/B testing is a simple
strategy to help learn how an audience responds to different content. The process
involves creating two or more versions of content and then displaying it to different
audience segments to reveal which version of the content performs better.

BIG DATA AND MARKETING

Big Data in Marketing: Role, Applications, & Benefits
Data has always played an essential role in sales and marketing. That is why Big Data has
quickly changed the landscape for modern marketers.

Whether it’s cloud-based data or real-time consumer behavior data, it can help organizations
create personalized offers, compete with other companies with unique marketing emails, or
employ a multi-cloud system to streamline all marketing calendars.

The role of Big Data in Marketing

Big Data relates to the huge amounts of structured and unstructured data generated daily. It has
the potential to change how marketers do business.

Big Data enables marketers to gain a fuller comprehension of their customer behavior,
preferences, and demographics by gathering data from various sources, such as social media,
customer feedback, and website analytics.

With the data collected, marketing teams can:

 Optimize pricing decisions

 Improve customer relationship management
 Reduce customer churn.

Five benefits of using Big Data in Marketing

Effective predictive modeling
By analyzing customer data, analysts can predict which customers are most likely to purchase in
the future.

The benefit: Marketers use this information to develop targeted marketing campaigns that are
more effective at driving sales.

7
Better personalization
By analyzing customer data, marketing teams can personalize messages and offers based on each
customer’s preferences and behavior.

The benefit: This personalization can lead to increased customer engagement and loyalty.

Optimizing marketing spend

By analyzing customer behavior and preferences, marketers can optimize their spending budget.

The benefit: Targeting only valuable customers allows companies to get the most out of their
marketing efforts.

Reducing customer churn

Analysts can identify which customers are at risk of leaving by analyzing customer behavior and
preferences.

The benefit: Marketers can reduce customer churn by targeting these customers with
personalized offers and messages.

Improving customer experience

By analyzing customer behavior and preferences, marketing teams can identify improvement
areas related to customer experience.

The benefit: Companies can improve customer experience and loyalty by making changes based
on this information.

How are marketers using Big Data in 2023?

Developing customer loyalty programs

By analyzing customer data, marketers can develop loyalty programs tailored to their customer’s
specific needs and preferences.

Identifying new market opportunities

By analyzing customer data, marketers can identify new opportunities for growth and expansion.

A new way of doing marketing research

Big data is also helpful for marketing research. By analyzing behaviors and preferences,
marketing teams can gain insights into what their customers really want and need.

Many companies are doing Big Data market research to develop new products and services and
improve existing ones.

Creating effective marketing strategies

A complete market understanding is the best way to identify effective marketing tactics and
channels.

Through Big Data, creative teams develop targeted marketing campaigns that are more likely to
drive sales and revenue.

8
Identifying the right target audiences
Marketers are also using Big data to identify target audiences. Through an in-depth consumer
analysis, marketers can identify their ideal target audiences.

Having the right target audience is also vital for advertising. Targeted advertising campaigns are
more likely to drive sales and revenue.

FRAUD AND BIG DATA

Big Data for Fraud Prevention

Big data analytics is an effective solution for identifying behavioral patterns and establishing
strategies to help detect and prevent fraud in various business sectors.

Most companies are not aware of the information they have and how to leverage, analyze and
understand it, which can result in the loss of a large amount of potentially useful data by
normalizing fraud and other criminal activities in their processes and make them difficult to
prevent and detect.

Fraud detection through Big data analysis, data mining and machine learning models uses trends,
patterns and behaviors to detect and prevent suspicious activities in purchasing processes, credit
activities, accounts or transactions, internal and external processes, among others. This makes
possible to automatically detect fraud and allow organizations to consolidate, map and normalize
large amounts of data that can be effectively analyzed to design strategies that detect and
establish connections between anomalous trends, notice a cybernetic attack or mark a security
breach.

Here are some of the general benefits of the implementation of fraud analytics in businesses:

 Identify irregular and unusual patterns, business problems or risk areas where activities or
processes may result in fraud.
 Saves costs and maximizes revenue.
 Detects anomalies across channels, comparing data from different information sources to
find discrepancies, such as social networks, databases, call centers, etc.
 Predicts suspicious activity before it causes damage to an organization’s assets or goods.
 It provides an internal view of processes and identifies where there’s more opportunity
for fraud activities by creating strategies that are better adapted to the operations of a
business.

The Importance of Big Data Analytics in Terms of Fraud Prevention:

As online purchase, payment and money transfer transactions increase, the risks of fraud that
may occur through these transactions also increase.It was very difficult for companies to
process and analyze the huge amount of data that emerged from these transactions and use it in
fraud detection. At this point, we come across an indispensable facilitating tool: big data
analytics for fraud detection. Using big data analytics in some points of fraud detection
provides many advantages.

9
One of the most important points when detecting fraud is to take actions quickly. It may take a
long time to identify the suspicious ones among this large number of irregular data resulting
from transactions.
Some transactions may be perceived as suspicious by misinterpretations as a result of these
long analyzes. During this evaluation process, there will still be a need for people, namely a
manual workload, to analyze the data and check for suspicious transactions or
misinterpretations.
In order to protect the company and customers from harm, it is necessary to draw up rules
based on this data and looking at past fraudulent activities, so that we can establish systems
that can prevent possible damages and frauds that may occur.
All these mean more cost, time, manual work. Big data analytics plays the biggest helper role
in solving these issues. By using data analyzed with techniques in big data analytics can
provide:

 Low costs
 More accurate and precise detections
 Optimized workflows and efficiency of systems
 Better services to customers

RISK AND BIG DATA

What Is Big Data?

Big Data is used to represent a large amount of data, including structured, semi-structured, and
unstructured data. Big Data is high-volume, high-velocity, and contains various data from
different sources.

Major Risks & Threats Come With Big Data

Analyzing such a large amount of data can come with various risks and threats that can affect the
business heavily. So organizations need to understand these risks and threats and identify the
best possible ways to minimize the risks. Let’s find out!

1. Privacy And Data Protection

When companies are collecting big data, then the first risk that comes with big data is data
privacy. This sensitive data is the backbone of many big companies, and if it leaks to any wrong
hand, like cybercrime or hackers, it can badly affect the business and its reputation. In 2019, 4.1
billion records were exposed through data breaches, according to the Risk-Based Security Mid-
Year Data Breach Report.

So businesses should mainly focus on protecting their data’s privacy and security from malicious
attacks.

10
2. Cost Management

Big data requires big costs for its maintenance, and companies should do the calculation of
collecting, storing, analyzing, and reporting the big data costs. So all companies need to budget
and plan well for maintaining big data.

If companies don’t plan for the management, they may face unpredictable costs, which can affect
the finances. The best way to manage big data costs is by eliminating irrelevant data and
analyzing the big data to find meaningful insights and solutions to achieve their goals.

3. Unorganized Data

As we’ve discussed, Big Data is a combination of structured, semi-structured, and unstructured

data that is the major problem companies face while managing big data, i.e., Unorganized Data.
It’s a complex process to categorize the data and make it well-structured.

From small business to enterprise-level, handling unorganized data becomes hectic. It requires a
well-planned strategy to collect, store, diversify, eliminate and optimize data to find meaningful
insights that help businesses make profitable decisions.

4. Data Storage And Retention

Big Data is not just information that can be stored in a computer; it’s a collection of structured,
semi-structured, and unstructured data from different sources that can be the size of zettabytes.
To store the big data, companies need to take a big server area where all the big data is stored,
processed, and analyzed.

This way companies should be concerned about the storage space of big data. Otherwise, it can
be a complex issue. Nowadays, companies leverage the power of cloud-based services to store
data and make accessibility easy and secure.

5. Incompetent Big Data Analysis

It’s estimated that the amount of data generated by users each day will reach 463 exabytes
worldwide, according to weforum. The main aim of big data is to analyze and find meaningful
information that helps businesses to make the right business decisions and innovations. If any
organization doesn’t have a proper analyzing process, big data is just trash that seems
unnecessary.

The analysis makes big data Important, and companies should hire the best data analyst and
software that helps to analyze the big data and find meaningful insights with the help of
professional analysts and technology.

Thus, before planning to work on big data, each business, from small to enterprise-level, should
hire professional analysts and use powerful technologies to analyze big data.

11
6. Poor Data Quality

One key risk of getting big data is that organizations may reach poor quality, irrelevant or out-of-
date databases that will not help their business to find something meaningful.

In Big Data, when everything is stored, whether structured, semi-structured, or unstructured data
then it’s a risk for organizations to collect and analyze the data because it may be useful or not
for their business based on the data relevancy.

Many challenges come across while analyzing big data, and organizations must be prepared for
these outputs, try to eliminate irrelevant data, and focus on analyzing relevant data to get
meaningful insights.

7. Deployment Process

Deployment is a core process of an organization to collect and analyze big data and deploy
meaningful insights in a time period. In this situation, companies have two options for data
deployment, i.e. first is to use an in-house deployment process where the big data is collected
and analyzed to find meaningful insights, but this process takes a good amount of time.

Instead of setting up their server infrastructure, a cloud-based solution is more convenient, easy,
and beneficial because there’s no internal infrastructure to store big data. The amount of time to
deploy meaningful insights from big data is important.

Big Data Security Issues And Their Solutions

Big Data comes with several security issues and dangers that can heavily impact organizations.
So it’s important for businesses to understand and resolve these security issues.

Here are some common issues and dangers of big data along with their solution.

1. Data Storage

Problem: When businesses plan to store big data, the first problem they face is storage space.
Many companies are leveraging the power of cloud space to store data, but due to online access
to data, there’s a chance of security issues. So some companies prefer to own their physical
server storage to store the database.

One of the major data storage issues faced by Amazon in 2017, where AWS cloud storage is full
& doesn’t have space to run even basic operations and later Amazon resolved the issue and
maintained the storage to prevent this problem in future.

Solution: To resolve the problem, companies should store their sensitive data in an on-premises
database, and less sensitive data can be stored in cloud storage. But still, there are security issues
that can be resolved by hiring cybersecurity experts.

It may increase the cost of organizations, but database value is more worth it.

12
2. Fake Data

Problem: Another Big Data issue many organizations may face, i.e., fake databases. When
collecting data, companies require a relevant database that can be analyzed and used to generate
meaningful insights. However, having irrelevant or fake data can waste any organization’s
efforts and costs in analyzing the data.

In 2016, Facebook faced the issue of fake databases because the algorithm’s didn’t recognise real
or fake news differences and ended up with nonsense political issues, according to Vox.

Solution: To validate the data source, organizations should do periodic assessments and
evaluations of databases to find irrelevant data and eliminate it. So that only relevant data is left
to analyze and generate results.

3. Data Access Control

Problem: When users get access to control the data like view, edit or remove, it may affect the
business operations and privacy.

Here’s an example:

Netflix which reported the loss of 200,000 subscribers in Q1 because users are sharing their login
details with friends/family to log in with the same account. Later Netflix takes charge and
controls data accessibility to limited users on a single account.

Solution: Its solution is used to work with Identity Access Management (IAM) to simplify the
process of controlling the data via identification, authentication, and authorization. By following
the ISO standards, organizations can protect their access to IAM.

4. Data Poisoning

Problem: Nowadays, almost every website has Chatbots on their website, and it’s a target of
hackers to attack these Machine Learning models that lead to Data Poisoning, where
organizations’ databases can be manipulated and injected.

Solution: The best way to resolve this issue is through outlier detection. It helps to separate
injected elements from the existing data distribution.

13
CREDIT RISK MANAGEMENT

Credit risk management refers to managing the probability of a company's losses if its borrowers
default in repayment. The main purpose is to reduce the rising quantum of the non-performing
assets from the customers and to recover the same in due time with appropriate decisions.

1. Customer onboarding and Know Your Customer (KYC)

The KYC process is primarily a regulatory obligation imposed on banks and financial
service providers to prevent money laundering and terrorist financing. Beyond that, it offers the
opportunity to create a comprehensive customer profile that, if properly maintained, provides all
relevant information needed for regular sanction list and PEP screening, or for periodic updating
of the credit rating, for instance.

Especially in the KYC process, together with onboarding, the potential efficiency can be
raised considerably by means of digitisation and automation. Solutions like ACTICO KYC can
be integrated into the onboarding process via suitable interfaces and, for example, take over
automated comparison of customer data with sanction lists and PEP (politically exposed person)
lists, updating of risk classification, or documentation of a company’s beneficial owners.

2. Creditworthiness assessment

The basis for assessing a company’s creditworthiness is balance sheet analysis. Annual
financial statements and quarterly reports do provide extensive data on a company’s financial
situation, but the acquisition and analysis of this data is often quite an obstacle. Slow manual
processes delay credit decisions and increase costs.

The use of artificial intelligence (AI) can automate the incorporation and reading of
balance sheets. With automated spreading, financial data is captured from financial statements
and assigned to the appropriate categories. This means that all customers’ data is available in one
uniform format and can easily be processed further.

3. Risk quantification

Risk quantification comprises determining the probability of default (PD), loss given
default (LGD) and risk-adjusted return on capital (RAROC). It provides the basis for pricing and
other credit terms.

Commercial lending today is still largely based on a manual process involving in-depth analysis
of baseline data and evaluation of soft factors. In such cases, the decision depends, to no small
extent, on the loan officer’s experience, which has a relevant influence on the weighting of the
various risk items. This is not necessarily a disadvantage: The human factor can be a corrective
when it comes to data from manipulated books or unrealistic forecasts regarding sales and
growth. On the other hand, this can allow systemic errors to creep in, with a negative impact on
the lender’s margin.

14
4. Credit decision

It is true that banks can currently look forward to greater interest in asset financing, but in view
of the ever-shorter innovation cycles and volatile development of the German economy, it is also
the case that innovation decisions often have to be made at shorter notice today than ten years
ago. On top of that, queries are becoming increasingly individual and complex – in other words,
more time-consuming. Nevertheless, prospective borrowers are rarely willing to accept the
resulting longer processing times and higher costs.

5. Price calculation

When calculating credit terms, many banks still rely on a ‘one size fits all’ approach, which can
only be deviated from within narrow limits. As a result, creditworthy customers have to pay a
premium to subsidise riskier customers.

Meanwhile, machine learning has already become the tool of choice for pricing a wide range of
financial products. It can also be put to more use in the lending business. It allows the individual
probability of default and the general repayment performance of the borrower to be determined
very reliably. For banks and financiers, this presents an opportunity to depart from the old rigid
pricing scheme and switch to dynamic risk-based pricing.

6. Monitoring after payout

As long as the borrower pays their instalments on time, everything is fine. However, if problems
arise at some point, it may already be too late. It is therefore vital that banks also monitor the
borrower’s ongoing development, so as to be able to react to changes in a timely manner.

BIG DATA IN ALGORITHMIC TRADING

What is Algorithmic Trading?

Application of computer and communication techniques has stimulated the rise of

algorithm trading. Algorithm trading is the use of computer programs for entering trading orders,
in which computer programs decide on almost every aspect of the order, including the timing,
price, and quantity of the order etc.

In previous days investment researches were done on day-to-day basis information and
patterns. Now the volatilities in market are more than ever and due to this risk factor has been
increased. Investment banks has increased risk evaluation from inter-day to intra-day. RBI
interests rates, key governmental policies, news from SEBI, quarterly results, geo-political events
and many other factors influence the market within a couple of seconds and hugely.

Investment banks use algorithmic trading which houses a complex mechanism to derive
business investment decisions from insightful data. Algorithmic trading involves in using
complex mathematics to derive buy and sell orders for derivatives, equities, foreign exchange
rates and commodities at a very high speed.
15
Algorithm trading has been adopted by institutional investors and individual investors and
made profit in practice. The soul of algorithm trading is the trading strategies, which are built
upon technical analysis rules, statistical methods, and machine learning techniques. Big data era is
coming, although making use of the big data in algorithm trading is a challenging task, when the
treasures buried in the data is dug out and used, there is a huge potential that one can take the lead
and make a great profit.

The soul of algorithm trading is the trading strategies, which are built upon technical
analysis rules, statistical methods, and machine learning techniques. Big data era is coming,
although making use of the big data in algorithm trading is a challenging task, when the treasures
buried in the data is dug out and used, there is a huge potential that one can take the lead and
make a great profit.

Role of Big Data in Algorithmic Trading

1.Technical Analysis : Technical Analysis is the study of prices and price behavior, using charts
as the primary tool.

2. Real Time Analysis : The automated process enables computer to execute financial trades at
speeds and frequencies that a human trader cannot.

3. Machine Learning : With Machine Learning, algorithms are constantly fed data and actually
get smarter over time by learning from past mistakes, logically deducing new conclusions based
on past results and creating new techniques that make sense based on thousands of unique factors.

How Big Data can be used for Algorithmic Trading

There are several standard modules in a proprietary algorithm trading system, including trading
strategies, order execution, cash management and risk management. Trading strategies are the
core of an automated trading system. Complex algorithms are used to analyze data (price data and
news data) to capture anomalies in market, to identify profitable patterns, or to detect the
strategies of rivals and take advantages of the information. Various techniques are used in trading
strategies to extract actionable information from the data, including rules, fuzzy rules, statistical
methods, time series analysis, machine learning, as well as text mining.

Algorithmic trading is the current trend in the financial world and machine learning helps
computers to analyze at rapid speed. The real-time picture that big data analytics provides gives
the potential to improve investment opportunities for individuals and trading firms.

⦁ Estimation of outcomes and returns.

Access to big data helps to mitigate probable risks on online trading and making precise
predictions. Financial analytics helps to tie up principles that affect trends, pricing and price
behavior.

⦁ Deliver accurate predictions

16
Big data can be used in combination with machine learning and this helps in making a decision
based on logic than estimates and guesses. The data can be reviewed and applications can be
developed to update information on a regular basis for making accurate predictions.

⦁Backtesting Strategy

One of the features of Algorithmic Trading is the ability to backtest. It can be tough for traders to
know what parts of their trading system work and what doesn’t work since they can’t run their
system on past data. With algo trading, you can run the algorithms based on past data to see if it
would have worked in the past. This ability provides a huge advantage as it lets the user remove
any flaws of a trading system before you run it live.

BIG DATA IN HEALTHCARE AND MEDICINE

Big data has become more influential in healthcare due to three major shifts in the healthcare
industry: the vast amount of data available, growing healthcare costs, and a focus on
consumerism. Big data enables health systems to turn these challenges into opportunities to
provide personalized patient journeys and quality care.

 Increasing Volume of Healthcare Data: When health records went digital, the amount
of virtual data health systems had to handle rose steeply. In addition to EHRs, vast
amounts of data are sourced in other ways – through wearable technology, mobile
applications, digital marketing efforts, social media, and more. All of this adds up to an
incredible amount of information, spurring health systems to adopt big data systems and
technologies to effectively collect, analyze, and take advantage of this information.
 Growing Healthcare Costs: In the past 20 years, the United States has seen rapid growth
in healthcare costs. Today, healthcare expenses account for around 18 percent of GDP,
totaling about $3.4 trillion. This is partially due to lifestyle factors, as well as government
regulations. Through the collection and analysis of large amounts of data, healthcare
organizations will find quantifiable ways to improve performance and efficiency. This
promotes both increased patient satisfaction and your ability to capture greater market
share.
 Desire for Personalized Care: Consumers in all industries expect exceptional,
convenient, personalized service – a phenomenon that retail industry executives have
dubbed “The Amazon Experience.” Healthcare is no different. Customers want
convenient, personalized care, a new standard to which health systems must rise. This
new model of care focuses on quality, engagement, and retention. Health systems are
turning to healthcare big data to provide the insights necessary to drive this level of
personalization.

What challenges arise with big data in healthcare?

A major challenge with healthcare big data is sorting and prioritizing information. Data
capacities are so vast that oftentimes it can be difficult to determine which data points and
insights are useful. As a result, many organizations use AI or machine learning to process this
data with exceptional agility.

17
Another challenge is ensuring that the right access to big data insights and analysis is given to
the right people so they can work intelligently. Even though healthcare data is pulled from many
different systems, organizations need to make sure critical personnel across the industry have
comprehensive access to the information.

There are also a number of data analysis challenges that result from heterogeneous or missing
claims data. The complexity of data is further compounded by each healthcare institution filing
claims with data from other Hospital Information Systems (HIS), or input from hospital
personnel at the time of the encounter. The data becomes even more complex when factoring in
all the ambulatory places or service types. As a result, there are five challenges to overcome in
order to obtain accurate claims data:

 Billing systems are fragmented and dated – Data is often very “noisy” – practices,
groups, and even service line specialties can be inconsistent. The key is to consider
directional data in combination with your local geographic market knowledge; in other
words, data should augment interactions and focused outreach to physicians, not replace
it.
 Patients do not have unique patient identifiers – If every patient had a unique
identifier, data matching would not be required. Until that happens, data matching
mechanisms are required to look for these data anomalies and put the right patient claims
together.
 Diagnosis and procedure codes can be unclear – Even industry-standard grouper tools
can obscure or mis-map physician activity. Perfect data and perfect insights are very hard
to achieve, so you have to advocate for, and learn to work with, directional data.
 Claims data is highly inconsistent – With claims data, any field data that is not required
for payment has a low probability of being completed accurately. In fact, among the few
required fields for payment, along with patient, diagnosis, and procedure information, is
the “rendering physician” via the NPI1 for that provider.
 It’s difficult to identify the referring physician – The “referring physician” field on
available third-party claims is often inconsistent, incorrect, or not filled at all. In fact,
some clearinghouses don’t even provide the “referring physician” filed because of these
inconsistencies.

What is the future of big data in healthcare?

In the future, healthcare organizations will adopt big data in greater numbers as it becomes more
crucial for success. Healthcare big data will also continue to help make marketing touchpoints
smarter and more integrated. Additionally, the amount of data available will grow as wearable
technology and the Internet of Things (IoT) gains popularity. Constant patient monitoring via
wearable technology and the IoT will become standard and will add enormous amounts of
information to big data stores. With this information, healthcare marketers can integrate a large
volume of healthcare insights to find and retain patients with the highest propensity for services.

18
ADVERTISING AND BIG DATA

Marketing and advertising agencies are now embracing the Big Data technology for all glory in
personalized marketing campaigns. With the use of Big Data in delivering targeted ads and
purchase recommendations, the marketing and advertising companies are planning out the future
of the technology in the marketing space. The bad news, however, is that even after companies
get their hands on the right data, they are yet in a fix on how to use it for their advantage, and for
the ultimate benefit of the buyer.

Designing successful digital advertising strategies and campaigns

To design efficient advertising campaigns, the advertisers must collect and analyze a wide
variety of data. The data has to be gathered from both internal and external sources, including the
unstructured data. Different forms of data, like, social media posts, images, videos etc. are
collected. The digital advertisers get a chance to gain valuable insights from the data. The
meaningful insights are core of the powerful marketing decisions, campaigns and strategies.

Big Data's Big Role in Advertising and Marketing

 Detect target audience’s pattern

With highly genuine data captured from across social media channels and platforms,
companies are able to match the user's behavior and detect a pattern of their sentiments
regarding a product or service. This, in turn, allows these companies to successfully
create and execute targeted campaigns that can achieve high sales and ROI for the
marketing companies.

 Find the interest of the customers using the digital data

A consumer's digital footprint is the asset that most companies want to use to get their
marketing right. A google search, a Facebook like, or a simple tweet can help companies
know about the particular person's interests, after which, the enterprise can target
recommendations to the user to help him find the product or service he was interested in.

 Curate creative, customized and charismatic communications using insights

Within online communities, Big Data can steer the wheel of growth and insights for an
organization, as it yields highly informative patterns about each user's searches, their
opinions, their needs, etc. With the right Big Data strategy, companies can fetch the gold
out of waste. Based on the insights, the companies can design much more impactful
communications. The businesses would be able to strike a chord with the audience as
their messaging will contain personalized details. Therefore, the customers and audience
will be able to understand the communication more clearly. Hence, the chances of lead
generation and conversion of leads into customers are higher.

19
BIG DATA TECHNOLOGIES

Big data technologies can be categorized into four main types: data storage, data mining, data
analytics, and data visualization [2]. Each of these is associated with certain tools, and you’ll
want to choose the right tool for your business needs depending on the type of big data
technology required.

1. Data storage

Big data technology that deals with data storage has the capability to fetch, store, and manage big
data. It is made up of infrastructure that allows users to store the data so that it is convenient to
access. Most data storage platforms are compatible with other programs. Two commonly used
tools are Apache Hadoop and MongoDB.

 Apache Hadoop:Apache is the most widely used big data tool. It is an open-source
software platform that stores and processes big data in a distributed computing
environment across hardware clusters. This distribution allows for faster data processing.

The framework is designed to reduce bugs or faults, be scalable, and process all data
formats.

 MongoDB:MongoDB is a NoSQL database that can be used to store large volumes of

data. Using key-value pairs (a basic unit of data), MongoDB categorizes documents into
collections. It is written in C, C++, and JavaScript, and is one of the most popular big
data databases because it can manage and store unstructured data with ease.

2. Data mining

Data mining extracts the useful patterns and trends from the raw data. Big data technologies such
as Rapidminer and Presto can turn unstructured and structured data into usable information.

 Rapidminer:Rapidminer is a data mining tool that can be used to build predictive

models. It draws on these two roles as strengths, of processing and preparing data, and
building machine and deep learning models. The end-to-end model allows for both
functions to drive impact across the organization.

 Presto: Presto is an open-source query engine that was originally developed by Facebook
to run analytic queries against their large datasets. Now, it is available widely. One query
on Presto can combine data from multiple sources within an organization and perform
analytics on them in a matter of minutes.

3. Data analytics

In big data analytics, technologies are used to clean and transform data into information that can
be used to drive business decisions. This next step (after data mining) is where users perform
algorithms, models, and more using tools such as Apache Spark and Splunk.

20
 Apache Spark:Spark is a popular big data tool for data analysis because it is fast and
efficient at running applications. It is faster than Hadoop because it uses random access
memory (RAM) instead of being stored and processed in batches via MapReduce. Spark
supports a wide variety of data analytics tasks and queries.

 Splunk:Splunk is another popular big data analytics tool for deriving insights from large
datasets. It has the ability to generate graphs, charts, reports, and dashboards. Splunk also
enables users to incorporate artificial intelligence (AI) into data outcomes.

4. Data visualization

Finally, big data technologies can be used to create stunning visualizations from the data. In
data-oriented roles, data visualization is a skill that is beneficial for presenting recommendations
to stakeholders for business profitability and operations—to tell an impactful story with a simple
graph.

 Tableau:Tableau is a very popular tool in data visualization because its drag-and-drop

interface makes it easy to create pie charts, bar charts, box plots, Gantt charts, and more.
It is a secure platform that allows users to share visualizations and dashboards in real
time.

 Looker: Looker is a business intelligence (BI) tool used to make sense of big data
analytics and then share those insights with other teams. Charts, graphs, and dashboards
can be configured with a query, such as monitoring weekly brand engagement through
social media analytics.

INTRODUCTION TO HADOOP

Hadoop is an open-source software framework that is used for storing and processing
large amounts of data in a distributed computing environment. It is designed to handle big data
and is based on the MapReduce programming model, which allows for the parallel processing of
large datasets.

Hadoop has two main components:

 HDFS (Hadoop Distributed File System): This is the storage component of Hadoop,
which allows for the storage of large amounts of data across multiple machines. It is
designed to work with commodity hardware, which makes it cost-effective.
 YARN (Yet Another Resource Negotiator): This is the resource management
component of Hadoop, which manages the allocation of resources (such as CPU and
memory) for processing the data stored in HDFS.
 Hadoop also includes several additional modules that provide additional functionality,
such as Hive (a SQL-like query language), Pig (a high-level platform for creating
MapReduce programs), and HBase (a non-relational, distributed database).
 Hadoop is commonly used in big data scenarios such as data warehousing, business
intelligence, and machine learning. It’s also used for data processing, data analysis, and
data mining. It enables the distributed processing of large data sets across clusters of
computers using a simple programming model.
21
History of Hadoop

Hadoop is an open-source software framework for storing and processing big data. It was created
by Apache Software Foundation in 2006, based on a white paper written by Google in 2003 that
described the Google File System (GFS) and the MapReduce programming model. The Hadoop
framework allows for the distributed processing of large data sets across clusters of computers
using simple programming models. It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage. It is used by many organizations,
including Yahoo, Facebook, and IBM, for a variety of purposes such as data warehousing, log
processing, and research. Hadoop has been widely adopted in the industry and has become a key
technology for big data processing.

Features of hadoop:

1. it is fault tolerance.

2. it is highly available.

3. it’s programming is easy.

4. it have huge flexible storage.

5. it is low cost.

Hadoop has several key features that make it well-suited for big data processing:

 Distributed Storage: Hadoop stores large data sets across multiple machines, allowing
for the storage and processing of extremely large amounts of data.
 Scalability: Hadoop can scale from a single server to thousands of machines, making it
easy to add more capacity as needed.
 Fault-Tolerance: Hadoop is designed to be highly fault-tolerant, meaning it can continue
to operate even in the presence of hardware failures.
 Data locality: Hadoop provides data locality feature, where the data is stored on the
same node where it will be processed, this feature helps to reduce the network traffic and
improve the performance
 High Availability: Hadoop provides High Availability feature, which helps to make sure
that the data is always available and is not lost.
 Flexible Data Processing: Hadoop’s MapReduce programming model allows for the
processing of data in a distributed fashion, making it easy to implement a wide variety of
data processing tasks.
 Data Replication: Hadoop provides data replication feature, which helps to replicate the
data across the cluster for fault tolerance.
 Data Compression: Hadoop provides built-in data compression feature, which helps to
reduce the storage space and improve the performance.
 YARN: A resource management platform that allows multiple data processing engines
like real-time streaming, batch processing, and interactive SQL, to run and process data
stored in HDFS.

22
Hadoop Distributed File System

It has distributed file system known as HDFS and this HDFS splits files into blocks and sends
them across various nodes in form of large clusters. Also in case of a node failure, the system
operates and data transfer takes place between the nodes which are facilitated by HDFS.

Hadoop framework is made up of the following modules:

1. Hadoop MapReduce- a MapReduce programming model for handling and processing

large data.
2. Hadoop Distributed File System- distributed files in clusters among nodes.
3. Hadoop YARN- a platform which manages computing resources.
4. Hadoop Common- it contains packages and libraries which are used for other modules.

Advantages and Disadvantages of Hadoop

Advantages:

 Ability to store a large amount of data.

 High flexibility.
 Cost effective.
 High computational power.
 Tasks are independent.
 Linear scaling.

Hadoop has several advantages that make it a popular choice for big data processing:

 Scalability: Hadoop can easily scale to handle large amounts of data by adding more
nodes to the cluster.
 Cost-effective: Hadoop is designed to work with commodity hardware, which makes it a
cost-effective option for storing and processing large amounts of data.

23
 Fault-tolerance: Hadoop’s distributed architecture provides built-in fault-tolerance, which
means that if one node in the cluster goes down, the data can still be processed by the
other nodes.
 Flexibility: Hadoop can process structured, semi-structured, and unstructured data, which
makes it a versatile option for a wide range of big data scenarios.
 Open-source: Hadoop is open-source software, which means that it is free to use and
modify. This also allows developers to access the source code and make improvements or
add new features.
 Large community: Hadoop has a large and active community of developers and users
who contribute to the development of the software, provide support, and share best
practices.
 Integration: Hadoop is designed to work with other big data technologies such as Spark,
Storm, and Flink, which allows for integration with a wide range of data processing and
analysis tools.

Disadvantages:

 Not very effective for small data.

 Hard cluster management.
 Has stability issues.
 Security concerns.
 Complexity: Hadoop can be complex to set up and maintain, especially for organizations
without a dedicated team of experts.
 Latency: Hadoop is not well-suited for low-latency workloads and may not be the best
choice for real-time data processing.

 Limited Support for Real-time Processing: Hadoop’s batch-oriented nature makes it less
suited for real-time streaming or interactive data processing use cases.
 Limited Support for Structured Data: Hadoop is designed to work with unstructured and
semi-structured data, it is not well-suited for structured data processing
 Data Security: Hadoop does not provide built-in security features such as data encryption
or user authentication, which can make it difficult to secure sensitive data.
 Limited Support for Ad-hoc Queries: Hadoop’s MapReduce programming model is not
well-suited for ad-hoc queries, making it difficult to perform exploratory data analysis.
 Limited Support for Graph and Machine Learning: Hadoop’s core component HDFS and
MapReduce are not well-suited for graph and machine learning workloads, specialized
components like Apache Graph and Mahout are available but have some limitations.
 Cost: Hadoop can be expensive to set up and maintain, especially for organizations with
large amounts of data.
 Data Loss: In the event of a hardware failure, the data stored in a single node may be lost
permanently.
 Data Governance: Data Governance is a critical aspect of data management, Hadoop does
not provide a built-in feature to manage data lineage, data quality, data cataloging, data
lineage, and data audit.

24
OPEN SOURCE TECHNOLOGIES
1. Hadoop

It is recognized as one of the most popular big data tools to analyze large data sets, as the
platform can send data to different servers. Another benefit of using Hadoop is that it can also
run on a cloud infrastructure.

This open-source software framework is used when the data volume exceeds the available
memory. This big data tool is also ideal for data exploration, filtration, sampling, and
summarization. It consists of four parts:

 Hadoop Distributed File System: This file system, commonly known as HDFS, is a
distributed file system compatible with very high-scale bandwidth.

 MapReduce: It refers to a programming model for processing big data.

 YARN: All Hadoop’s resources in its infrastructure are managed and scheduled using
this platform.

 Libraries: They allow other modules to work efficiently with Hadoop.

2. Apache Spark

The next hype in the industry among big data tools is Apache Spark. See, the reason
behind this is that this open-source big data tool fills the gaps of Hadoop when it comes to data
processing. This big data tool is the most preferred tool for data analysis over other types of
programs due to its ability to store large computations in memory.

3. Cassandra

Apache Cassandra is one of the best big data tools to process structured data sets. Created
in 2008 by Apache Software Foundation, it is recognized as the best open-source big data tool
for scalability. This big data tool has a proven fault-tolerance on cloud infrastructure and
commodity hardware, making it more critical for big data uses.

4. MongoDB

MongoDB is an ideal alternative to modern databases. A document-oriented database is

an ideal choice for businesses that need fast and real-time data for instant decisions. One thing
that sets it apart from other traditional databases is that it makes use of documents and
collections instead of rows and columns.

5. HPCC

High-Performance Computing Cluster, or HPCC, is the competitor of Hadoop in the big

data market. It is one of the open-source big data tools under the Apache 2.0 license. Developed
25
by LexisNexis Risk Solution, its public release was announced in 2011. It delivers on a single
platform, a single architecture, and a single programming language for data processing.

If you want to accomplish big data tasks with minimal code use, HPCC is your big data tool. It
automatically optimizes code for parallel processing and provides enhanced performance.

6. Apache Storm

It is a free big data open-source computation system. It is one of the best big data tools
that offers a distributed, real-time, fault-tolerant processing system. Having been benchmarked as
processing one million 100-byte messages per second per node, it has big data technologies and
tools that use parallel calculations that can run across a cluster of machines. Being open source,
robust and flexible, it is preferred by medium and large-scale organizations. It guarantees data
processing even if the messages are lost, or nodes of the cluster die.

7. Apache SAMOA

Scalable Advanced Massive Online Analysis (SAMOA) is an open-source platform used

for mining big data streams with a special emphasis on machine learning enablement. It supports
theWrite Once Run Anywhere(WORA) architecture that allows seamless integration of multiple
distributed stream processing engines into the framework.

8. CouchDB

CouchDB uses JSON documents that can be browsed online or queried using JavaScript to store
information. It enables fault-tolerant storage and distributed scaling. By creating the Couch
Replication Protocol, it permits data access. A single logical database server can be run on any
number of servers thanks to one of the massive data processing tools.

BIG DATA AND CLOUD

Big data and Cloud Computing are one of the most used technologies in today’s
Information Technology world. With these two technologies, business, education, healthcare,
research & development, etc are growing rapidly and will provide various advantages to expand
their areas with tricks and techniques.

So, in this Big Data Vs Cloud Computing tutorial, we will study the major difference between
Big data and Cloud computing and gather important information.

So, are you excited to explore Big data Vs Cloud Computing?

26
What is Big Data?

Big data refers to extraordinarily massive and complicated datasets that typical data processing
tools are unable to manage effectively. These datasets stand out for their size, speed, and variety.
The phrase “big data” refers to data sets that are frequently so huge that they are difficult to
handle or analyse with conventional database administration tools or data processing methods.

Organisations use cutting-edge technologies like distributed computing, parallel processing,

cloud computing, and specialised big data platforms and tools like Hadoop, Spark, and NoSQL
databases to handle and analyse large data. Data-driven decision-making, business intelligence,
and predictive analytics are made possible across a variety of businesses and disciplines because
of these technologies’ assistance in extracting useful insights, patterns, and trends from massive
data.

What is Cloud Computing?

Through the use of a shared pool of computing resources such as servers, storage, databases,
networking, software, and more, cloud computing enables the delivery of computer services via

the internet. Users can remotely access these resources via cloud service providers through the
internet, eliminating the need to buy and maintain physical gear and software.

With solutions that are affordable, scalable, and easily available for a variety of computing
needs, from storage and processing to application hosting and data analytics, cloud computing
has emerged as a basic technology for both enterprises and consumers.

Big Data Vs Cloud Computing (Major Differences)

Let’s see 8 major differences between Big Data and Cloud Computing:

27
i. Concept

In cloud computing, we can store and retrieve the data from anywhere at any time. Whereas, big
data is the large set of data which will process to extract the necessary information.

ii. Characteristics

Cloud Computing provides the service over the internet which can be:

 Software as a Service (SaaS)

 Platform as a Service (PaaS)
 Infrastructure as a Service (IaaS)

Whereas, there are some important characteristics of Big data which can lead to strategic
business moves and they are Velocity, Variety, and Volume.

iii. Accessibility

Cloud Computing provides universal access to the services. Whereas, Big data solves technical
problems and provides better results.

iv. When to use

A customer can shift to Cloud Computing when they need rapid deployment and scaling of the
applications. The application deals with highly sensitive data and requires strict compliance one
should keep things on the cloud.

Whereas, we can use Big Data for traditional methods and here frameworks are ineffective. Big
data is not replacement for relational database system and big data solve specific problem
statement related to large data sets and most of the large data sets do not deal with small data.

v. Cost

Cloud Computing is economical as it has low maintenance costs centralized platform no upfront
cost and disaster safe implementation. Whereas, Big data is highly scalable, robust ecosystem,
and cost-effective.

vi. Job roles and responsibility

The user of the cloud is the developers or office worker in an organization. Whereas, there is big
data analyst in big data which are responsible for analyzing the date of filing interesting and sites
and possible future trends.

vii. Types and trends

Cloud Computing includes three types which are:

 Public Cloud
 Private Cloud
 Hybrid Cloud

28
 Community Cloud

Whereas, some important trends in Big Data Technology is Hadoop, MapReduce, and HDFS.

viii. Vendors

Some of the vendors and solution providers of Cloud Computing are

 Google
 Amazon Web Service
 Microsoft
 Dell
 Apple
 IBM

Whereas, some of the vendors and solution providers of big data are

 Cloudera
 Hortonworks
 Apache
 MapR

MOBILE BUSINESS INTELLIGENCE

BI delivers relevant and trustworthy information to the right person at the right time.
Mobile business intelligence is the transfer of business intelligence from the desktop to mobile
devices such as the BlackBerry, iPad, and iPhone.

The ability to access analytics and data on mobile devices or tablets rather than desktop
computers is referred to as mobile business intelligence. The business metric dashboard and key
performance indicators (KPIs) are more clearly displayed.

With the rising use of mobile devices, so have the technology that we all utilise in our
daily lives to make our lives easier, including business. Many businesses have benefited from
mobile business intelligence. Essentially, this post is a guide for business owners and others to
educate them on the benefits and pitfalls of Mobile BI.

Need for mobile BI?

Mobile phones' data storage capacity has grown in tandem with their use. You are
expected to make decisions and act quickly in this fast-paced environment. The number of
businesses receiving assistance in such a situation is growing by the day.

To expand your business or boost your business productivity, mobile BI can help, and it
works with both small and large businesses. Mobile BI can help you whether you are a
salesperson or a CEO. There is a high demand for mobile BI in order to reduce information time
and use that time for quick decision making. (Source)

29
As a result, timely decision-making can boost customer satisfaction and improve an
enterprise's reputation among its customers. It also aids in making quick decisions in the face of
emerging risks.

Data analytics and visualisation techniques are essential skills for any team that wants to
organise work, develop new project proposals, or wow clients with impressive presentations.

Advantages of mobile BI

1. Simple access

Mobile BI is not restricted to a single mobile device or a certain place. You can view
your data at any time and from any location. Having real-time visibility into a firm improves
production and the daily efficiency of the business. Obtaining a company's perspective with a
single click simplifies the process.

2. Competitive advantage

Many firms are seeking better and more responsive methods to do business in order to
stay ahead of the competition. Easy access to real-time data improves company opportunities and
raises sales and capital. This also aids in making the necessary decisions as market conditions
change.

3. Simple decision-making

As previously stated, mobile BI provides access to real-time data at any time and from
any location. During its demand, Mobile BI offers the information. This assists consumers in
obtaining what they require at the time. As a result, decisions are made quickly.

4. Increase Productivity

By extending BI to mobile, the organization's teams can access critical company data
when they need it. Obtaining all of the corporate data with a single click frees up a significant
amount of time to focus on the smooth and efficient operation of the firm. Increased productivity
results in a smooth and quick-running firm.

Disadvantages of mobile BI

1. Stack of data

The primary function of a mobile BI is to store data in a systematic manner and then
present it to the user as required. As a result, Mobile BI stores all of the information and does
end up with heaps of earlier data. The corporation only needs a small portion of the previous
data, but they need to store the entire information, which ends up in the stack.

2. Expensive

Mobile BI can be quite costly at times. Large corporations can continue to pay for their
expensive services, but small businesses cannot. As the cost of mobile BI is not sufficient, we

30
must additionally consider the rates of IT workers for the smooth operation of BI, as well as the
hardware costs involved.

However, larger corporations do not settle for just one Mobile BI provider for their
organisations; they require multiple. Even when doing basic commercial transactions, mobile BI
is costly.

3. Time consuming

Businesses prefer Mobile BI since it is a quick procedure. Companies are not patient
enough to wait for data before implementing it. In today's fast-paced environment, anything that
can produce results quickly is valuable. The data from the warehouse is used to create the
system, hence the implementation of BI in an enterprise takes more than 18 months.

4. Data breach

The biggest issue of the user when providing data to Mobile BI is data leakage. If you
handle sensitive data through Mobile BI, a single error can destroy your data as well as make it
public, which can be detrimental to your business.

Many Mobile BI providers are working to make it 100 percent secure to protect their
potential users' data. It is not only something that mobile BI carriers must consider, but it is also
something that we, as users, must consider when granting data access authorization.

5. Poor quality data

Because we work online in every aspect, we have a lot of data stored in Mobile BI, which
might be a significant problem. This means that a large portion of the data analysed by Mobile
BI is irrelevant or completely useless. This can speed down the entire procedure. This requires
you to select the data that is important and may be required in the future.

Best Mobile BI tools

1. Si Sense

Sisense is a flexible business intelligence (BI) solution that includes powerful analytics,
visualisations, and reporting capabilities for managing and supporting corporate data.

Businesses can use the solution to evaluate large, diverse databases and generate relevant
business insights. You may easily view enormous volumes of complex data with Si Sense's code-
first, low-code, and even no-code technologies. Si Sense was established in 2004 with its
headquarters in New York.

Since then, the team has only taken precautionary steps in their investigation. Once the company
had received $ 4 million in funding from investors, they began to pace its research.

2. SAP Roambi analytics

Roambi analytics is a BI tool that offers a solution that allows you to fundamentally rethink your
data analysis, making it easier and faster while also increasing your data interaction.

31
You can consolidate all of your company's data in a single tool using SAP Roambi Analytics,
which integrates all ongoing systems and data. Use of SAP Roambi analysis is a simple three-
step technique. Upload your html or spreadsheet files first. The information is subsequently
transformed into informative data or graphs, as well as data that may be visualised.

After the data is collected, you may easily share it with your preferred device. Roambi Analytics
was founded in 2008 by a team based in California.

3. Microsoft Power BI pro

Microsoft's strength BI is an easy-to-use tool for all non-technical business owners. who are
unfamiliar with BI tools but wish to aggregate, analyse, visualise, and share data you only need a
basic understanding of Excel and other Microsoft tools, and if you are familiar with these, the
Microsoft BI tool can be used as a self-service tool.

Microsoft Power BI has a unique feature that allows users to create subsets of data and then
automatically apply analytics to that information.

4. IBM Cognos Analytics

Cognos Analytics is an IBM-registered web-based business intelligence tool. Cognos Analytics

is now merging with Watsons, and the benefits for users are extremely exciting. Watson cognos
analytics will assist in connecting and cleaning the users' data, resulting in proper visualised
data.

That way, the business owner will know where they stand in comparison to their competitors and
where they can grow in the future. It combines reporting, modelling, analysis, dashboards to
help you understand your organization's data and make sound business decisions.

5. Amazon quick sights

Amazon Quick View assists in the creation and distribution of interactive BI dashboards to their
users, as well as the retrieval of answers in natural language queries in seconds. Quick sight can
be accessed through any device embedded in any website, portal, or app.

Amazon Quick Sight allows you to quickly and easily create interactive dashboards and reports
for your users. Anyone in your organisation can securely access those dashboards via browsers
or mobile devices.

32
CROWD SOURCING ANALYTICS

Crowdsourcing is a sourcing model in which an individual or an organization gets

support from a large, open-minded, and rapidly evolving group of people in the form of ideas,
micro-tasks, finances, etc. Crowdsourcing typically involves the use of the internet to attract a
large group of people to divide tasks or to achieve a target. The term was coined in 2005 by Jeff
Howe and Mark Robinson. Crowdsourcing can help different types of organizations get new
ideas and solutions, deeper consumer engagement, optimization of tasks, and several other
things.

Where Can We Use Crowdsourcing?

Crowdsourcing is touching almost all sectors from education to health. It is not only accelerating
innovation but democratizing problem-solving methods. Some fields where crowdsourcing can
be used.

1. Enterprise
2. IT
3. Marketing
4. Education
5. Finance
6. Science and Health

How To Crowdsource?

1. For scientific problem solving, a broadcast search is used where an organization

mobilizes a crowd to come up with a solution to a problem.
2. For information management problems, knowledge discovery and management is used to
find and assemble information.
3. For processing large datasets, distributed human intelligence is used. The organization
mobilizes a crowd to process and analyze the information.

Examples Of Crowdsourcing

1. Doritos: It is one of the companies which is taking advantage of crowdsourcing for a long
time for an advertising initiative. They use consumer-created ads for one of their 30-
Second Super Bowl Spots(Championship Game of Football).
2. Starbucks: Another big venture which used crowdsourcing as a medium for idea
generation. Their white cup contest is a famous contest in which customers need to
decorate their Starbucks cup with an original design and then take a photo and submit it
on social media.
3. Lays:” Do us a flavor” contest of Lays used crowdsourcing as an idea-generating
medium. They asked the customers to submit their opinion about the next chip flavor
they want.
4. Airbnb: A very famous travel website that offers people to rent their houses or apartments
by listing them on the website. All the listings are crowdsourced by people.

There are several examples of businesses being set up with the help of crowdsourcing.

33
Crowdsourced Marketing

As discussed already crowdsourcing helps grow businesses grow a lot. May it be a business idea
or just a logo design, crowdsourcing engages people directly and in turn, saves money and
energy. In the upcoming years, crowdsourced marketing will surely get a boost as the world is
accepting technology faster.

Crowdsourcing Sites

Here is the list of some famous crowdsourcing and crowdfunding sites.

1. Kickstarter
2. GoFundMe
3. Patreon
4. RocketHub

Advantages Of Crowdsourcing

1. Evolving Innovation: Innovation is required everywhere and in this advancing world

innovation has a big role to play. Crowdsourcing helps in getting innovative ideas from
people belonging to different fields and thus helping businesses grow in every field.
2. Save costs: There is the elimination of wastage of time of meeting people and convincing
them. Only the business idea is to be proposed on the internet and you will be flooded
with suggestions from the crowd.
3. Increased Efficiency: Crowdsourcing has increased the efficiency of business models as
several expertise ideas are also funded.

Disadvantages Of Crowdsourcing

1. Lack of confidentiality: Asking for suggestions from a large group of people can bring
the threat of idea stealing by other organizations.
2. Repeated ideas: Often contestants in crowdsourcing competitions submit repeated,
plagiarized ideas which leads to time wastage as reviewing the same ideas is not worthy.

INTER AND TRANS FIREWALL ANALYTICS

What is Firewall?

A firewall is a sort of network security hardware or software application that monitors and filters
incoming and outgoing network traffic according to a set of security rules. It serves as a barrier
between internal private networks and public networks (such as the public Internet).

A firewall’s principal goal is to allow non-threatening communication while blocking dangerous

or undesirable data transmission in order to protect the computer from viruses and attacks. A
firewall is a cybersecurity solution that filters network traffic and assists users in preventing
harmful malware from gaining access to the Internet on compromised machines.

34
Firewall is Hardware or Software?

Whether a firewall is hardware or software is one of the most difficult issues to answer. A
firewall, as previously noted, can be a network security equipment or a computer software
programme. This means that the firewall is available on both hardware and software levels, albeit
it is preferable to have both.

The functionality of each format (a firewall implemented as hardware or software) varies, but the
goal remains the same. A hardware firewall is a piece of hardware that connects a computer
network to a gateway. Consider a broadband router. A software firewall, on the other hand, is a
basic programme that works with port numbers and other installed software to protect a
computer.

Aside from that, cloud-based firewalls are available. FaaS (firewall as a service) is a typical
moniker for them. The ability to administer cloud-based firewalls from a central location is a
major benefit. Cloud-based firewalls, like hardware firewalls, are best recognized for perimeter
security.

Need of Firewall:

Firewalls are used to protect against malware and network-based threats. They can also aid in the
prevention of application-layer assaults. These firewalls serve as a barrier or a gatekeeper. They
keep track of every connection our machine makes to another network. They only let data
packets to pass through them if the data is coming or leaving from a trusted source designated by
the user.

The following are some of the major dangers of not having a firewall:

Open to the public:

If a computer is not protected by a firewall, it allows unrestricted access to other networks. This
means it accepts any type of connection that is made through another person. It is impossible to
detect threats or assaults going over our network in this circumstance. We render our devices
vulnerable to malicious users and other undesired sources if we don’t employ a firewall.

35
Data that has been lost or mutilated:

We’re leaving our gadgets open to everyone if we don’t use a firewall. This means that anyone,
even the network, can gain access to our device and have complete control over it.
Cybercriminals can easily destroy our data or utilise our personal information for their own gain
in this situation.

Network Crashes:

Anyone could gain access to our network and shut it down if we didn’t have a firewall. It may
prompt us to devote precious time and resources to restoring our network’s functionality.

As a result, it is critical to employ firewalls to protect our network, computer, and data from
unauthorized access.

What exactly is the work of a firewall?

A firewall system examines network traffic according to pre-set rules. The traffic is subsequently
filtered, and any traffic coming from untrustworthy or suspect sources is blocked. It only accepts
traffic that has been configured to accept. Firewalls often intercept network traffic at a
computer’s port, or entry point. According to pre-defined security criteria, firewalls allow or
block particular data packets (units of communication carried over a digital network). Only
trusted IP addresses or sources are allowed to send traffic in.

Firewall’s Functions

As previously established, the firewall acts as a gatekeeper. It examines all attempts to obtain
access to our operating system and blocks traffic from unidentified or unknown sources.

We can think of the firewall as a traffic controller since it operates as a barrier or filter between
the computer system and external networks (such as the public Internet). As a result, the major
function of a firewall is to protect our network and information by managing network traffic,
prohibiting unwanted incoming network traffic, and validating access by scanning network
traffic for dangerous things like hackers and viruses. Most operating systems (for example,
Windows OS) and security applications provide firewall capability by default. As a result, it’s a
good idea to make sure those options are enabled. We can also adjust the system’s security
settings to update automatically whenever new information becomes available.

36
Firewalls have grown in power, and now encompass a number of built-in functions and
capabilities:

 Preventing Network Threats

 Control based on the application and the user’s identity.
 Support for Hybrid Cloud.
 Performance that scales.
 Control and management of network traffic.
 Validation of access.
 Keep track of what happens and report on it.

Types of Firewalls:

Different types of firewalls exist, each with its own structure and functionality. The following is
a list of some of the most prevalent firewall types:

Proxy Firewall is a type of proxy firewall.

Firewalls with packet filtering.

Firewall with Stateful Multi-layer Inspection (SMLI).

Firewall with unified threat management (UTM).

DL Unit1 Final
No ratings yet
DL Unit1 Final
41 pages
Layers of The Atmosphere Power Point PDF
No ratings yet
Layers of The Atmosphere Power Point PDF
16 pages
Unit 2
No ratings yet
Unit 2
31 pages
ML Unit 1
No ratings yet
ML Unit 1
15 pages
Deep Learning
No ratings yet
Deep Learning
38 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Machine Learning-2
No ratings yet
Machine Learning-2
16 pages
Mathematics For Machine Learning-I
No ratings yet
Mathematics For Machine Learning-I
10 pages
Unit I
No ratings yet
Unit I
10 pages
Unit-5 DS Notes
No ratings yet
Unit-5 DS Notes
19 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Machine Learning QB
No ratings yet
Machine Learning QB
3 pages
ML Unit-1
No ratings yet
ML Unit-1
32 pages
MC4411 Project Work - Format
No ratings yet
MC4411 Project Work - Format
65 pages
ML Module 2 New
No ratings yet
ML Module 2 New
36 pages
Unit 3
No ratings yet
Unit 3
99 pages
Unit 5
No ratings yet
Unit 5
23 pages
Tree Traversals (Inorder, Preorder and Postorder)
No ratings yet
Tree Traversals (Inorder, Preorder and Postorder)
4 pages
R22B Tech CSE (AIML) IandIIYearSyllabus PDF
No ratings yet
R22B Tech CSE (AIML) IandIIYearSyllabus PDF
65 pages
Machine Learning
100% (1)
Machine Learning
90 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Unit V
No ratings yet
Unit V
109 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
No ratings yet
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
2 pages
Assignment of ML
No ratings yet
Assignment of ML
5 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
UNIT1
No ratings yet
UNIT1
38 pages
ML Unit - 3
No ratings yet
ML Unit - 3
23 pages
CO1 CC PPT Session 6
No ratings yet
CO1 CC PPT Session 6
22 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
FSD Unit III
No ratings yet
FSD Unit III
22 pages
Data Structures Unit 1-Linear Structures
No ratings yet
Data Structures Unit 1-Linear Structures
44 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
23ma2101 Advanced Mathematics For Scientific Computing
No ratings yet
23ma2101 Advanced Mathematics For Scientific Computing
10 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
29 pages
Unit V
No ratings yet
Unit V
67 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
CP5191 Machine Learning Techniques L T P C3 0 0 3
No ratings yet
CP5191 Machine Learning Techniques L T P C3 0 0 3
7 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Mastering Machine Learning - A Comprehensive Guide
No ratings yet
Mastering Machine Learning - A Comprehensive Guide
19 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
CD Unit - 1
No ratings yet
CD Unit - 1
38 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
RTRP Lab Project
No ratings yet
RTRP Lab Project
13 pages
Data Warehousing and Data Mining (10cs755)
No ratings yet
Data Warehousing and Data Mining (10cs755)
142 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
PPL-Unit 3
No ratings yet
PPL-Unit 3
28 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Chapter 8-Performance Management
No ratings yet
Chapter 8-Performance Management
14 pages
Endometrium Embryology and Development
No ratings yet
Endometrium Embryology and Development
1 page
Conceptual Development: Low Loss Precast Concrete Frame Buildings With Steel Connections
No ratings yet
Conceptual Development: Low Loss Precast Concrete Frame Buildings With Steel Connections
13 pages
Grade 06 History 1st Term Test Paper With Answers 2019 Sinhala Medium North Western Province
83% (6)
Grade 06 History 1st Term Test Paper With Answers 2019 Sinhala Medium North Western Province
7 pages
Ray Martinez - Resume 03 11 2023 - Most Recent
No ratings yet
Ray Martinez - Resume 03 11 2023 - Most Recent
3 pages
Week006-Descriptive Statistics: Laboratory Exercise 002
No ratings yet
Week006-Descriptive Statistics: Laboratory Exercise 002
3 pages
Taj Wellington Mews - Tri Fold Brochure
No ratings yet
Taj Wellington Mews - Tri Fold Brochure
2 pages
2 PDF
No ratings yet
2 PDF
232 pages
Basukukya
No ratings yet
Basukukya
9 pages
On A Clear Day A Town With An Ocean View Joe Hisaishi
No ratings yet
On A Clear Day A Town With An Ocean View Joe Hisaishi
22 pages
300 GPD Water Maker
No ratings yet
300 GPD Water Maker
7 pages
Spectrele Lui Marx - Derrida PDF
100% (1)
Spectrele Lui Marx - Derrida PDF
35 pages
Lead Small Teams
No ratings yet
Lead Small Teams
92 pages
Planning and Design of Radiology & Imaging Sciences
100% (1)
Planning and Design of Radiology & Imaging Sciences
39 pages
ANTENATAL ASSESSMENT Form 10
No ratings yet
ANTENATAL ASSESSMENT Form 10
4 pages
Pruebas y Ajustes r1300G
No ratings yet
Pruebas y Ajustes r1300G
21 pages
Event Management and Marketing in Tourism
No ratings yet
Event Management and Marketing in Tourism
8 pages
Access Que
No ratings yet
Access Que
19 pages
Cambridge International AS & A Level: Biology 9700/51
No ratings yet
Cambridge International AS & A Level: Biology 9700/51
16 pages
ACM-F015 Intern's Competency Checklist
No ratings yet
ACM-F015 Intern's Competency Checklist
6 pages
EE1005 L01 Computers & Programming
No ratings yet
EE1005 L01 Computers & Programming
35 pages
Mathematics Digital Text Book: Class Ix
No ratings yet
Mathematics Digital Text Book: Class Ix
16 pages
Monthly Reimbursement Bill Enclosure
No ratings yet
Monthly Reimbursement Bill Enclosure
3 pages
Loft D55 Spec Sheet
No ratings yet
Loft D55 Spec Sheet
5 pages
The Ghosts of Adichanallur - Artefacts That Suggest An Ancient Tamil Civilisation of Great Sophistication - The Hindu
No ratings yet
The Ghosts of Adichanallur - Artefacts That Suggest An Ancient Tamil Civilisation of Great Sophistication - The Hindu
12 pages
Air 0 Is The Next
No ratings yet
Air 0 Is The Next
2 pages
Occupational Health and Safety Policy For The National Department of Health
No ratings yet
Occupational Health and Safety Policy For The National Department of Health
14 pages
Tle 75602
No ratings yet
Tle 75602
70 pages
1 s2.0 S0263224113006519 Main
No ratings yet
1 s2.0 S0263224113006519 Main
11 pages